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(54) Teleconferencing system having a cc»llaboration selection facility 

(57) A teleconferencing system that integrates separate reat>time and asynchronous networks the former for 
reaMime audio and video, and the latter for control signals and textual, graphical and other data - in a manner 
which closely approximates the experience of face-to-face collaboration. The system provides an audioA/ideo 
(AV) path 13 for carrying AV signal among the workstations, a video mosaic generator (36) for combining 
images, and an audio summer or mixer 38. The system architecture is readily scalable to the largest enterprise 
network environments, it accommodates differing levels of collaborative capabilities available to individual 
users and permits high-quality audio and video capabilities(Figs 2A.2B,8C) to be readily superimposed onto 
existing personal computers and workstations 12 and their interconnecting LANs 10 and WANs 15. In the case of 
a plurality of geographically dispersed LANs 10 interconnected by a WAN 15, the demands made on the WAN 
are significantly reduced by employing multi-hopping techniques, including avoiding the unnecessary 
decompression of data at intermediate hops, as well as video mosaicing and cut-and-paste technology. This 
application discloses the facility adding a caller to a current video conference and selecting the type of 
collaboration/communication to be carried out with the caller. 
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TELECONFERENCING SYSTEM 

BACKGROUND OF THE CWENTION 

The present invention relates to teleconferencing systems, and in 
particular to computer-based teleconferencing systems for enhancing collaboration 
between and among individuals who are separated by distance and/or time (referred to 
herein as "distributed collaboration**). A system embodying the invention's goals can 
replicate, in a desktop environment and to the maximum extent possible, the full range, 
level and intensity of inteipersonal conmiunication and information sharing which would 
occur if all die panicipants were together in the same room at the same time (referred to 
herein as " face-to-face collaboration*"). 

It is well known to behavioral scientists that interpersonal 
communication involves a large number of subde and complex visual cues, referred to by 
names like ''eye contact" and "body language", which provide additional information 
over and above the spoken words and explicit gestures. These cues are, for the most 
pan, processed subconsciously by the participants, and often control the course of a 
meeting. 

In addition to spoken words, demonstrative gesmres and behavioral 
cues, collaboration often involves the sharing of visual information - e.g., printed 
material such as aiucles, drawings, photographs, charts and graphs, as well as videotapes 
and computer-based animations, visualizations and odier displays - in such a way that 
the participants can collectively and interactively examine, discuss, annotate and revise 
the information. This combination of spoken words, gestures, visual cues and interactive 
data sharing significandy enhances the effectiveness of collaboration in a variety of 
contexts, such as **brainstorming" and problem solving sessions among professionals in a 
particular field, consultations between one or more experts and one or more clients, 
sensitive business or political negotiations, and the like. In distributed collaboration 
settings, then, where the participants cannot be in the same place at the same time, the 
beneficial effecu of face-to-face collaboration wUl be realized only to tiie extent that each 
of the remotely located panicipants can be "recreated** at each site. 

To illustrate the difficulties inherent in reproducing die beneficial effects 
of face-to-face collaboration in a distributed collaboration environment, consider the case 
of decision-making in the fast-moving conmiodities uading markets, where many 



thousands of dollars or pounds of profit (or loss) may depend on an expert trader making 
the right decision within hours, or even minutes, of receiving a request from a distant 
client. The cxpen requires immediate access to a wide range of potentially relevant 
information such as financial data, historical pricing information, current price quotes, 
newswire services, government policies and programs, economic forecasts, weather 
reports, etc. Much of this information can be processed by the expert in isolation. 
However, before making a decision to buy or sell, he or she will firequenUy need to 
discuss the information with other experts, who may be geographically dispersed, and 
with the cUent. One or more of these other experts may be in a meeting, on anodier call, 
or otherwise temporarily unavaHablc. In this event, the expert must communicate 
-asynchronously' - to bridge time as well as distance. 

As discussed below, prior art deskn>p videoconferencing sysusms 
provide, at best, only a partial solution to ihe challenges of distributed collaboration in 
real time, primarily because of dieir lack of high-quality video (which is necessary for 
capniring the visual cues discussed above) and their limited dau sharing capabUiiies. 
Similarly, telephone answering machines, voice maU. fax machines and conventional 
electronic mail systems provide incomplete solutions to the problems presented by 
deferred (asynchronous) coUaboration because they are totaUy incapable of 
communicating visual cues, gesnires. etc. and. like conventional videoconferencing 
systems, are generally limited in the richness of the data that can be exchanged. 

It has been proposed to extend traditional videoconferencing capabUities 
from conference centers, where groups of participants must assemble in die same room, 
to die desktop, where individual participants may remain in dicir office or home. Such a 
system is disclosed in U.S. Patent No. 4.710.917 to Tompkins et al. for Video 
Conferencing Network issued on December 1, 1987. It has also been proposed to 
augmem such video conferencing systems wiUi limited "video mail" facilities. However, 
such dedicated videoconferencing systems (and extensions diereof) do not effectively 
leverage the investment in existing embedded information infrastrucmres - such as 
desktop personal computers and workstations, local area network (LAN) and wide area 
network (WAN) enviiomnents. buUding wiring, etc. - to facilitate interactive sharing of 
dau in die form of text, images, charts, graphs, recorded video, screen displays and die 
like. That is. they attempt to add computing capabilities to a videoconferencing system, 
rather tiian adding multimedia and collaborative capabilities to the user's existing 
computer system. Thus, while such systems may be useful in limited contexts, diey do 
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not provide the capabilities required for maximally effective collaboration, and are not 
cost-effective. 

Conversely, audio and video capture and processing capabilities have 
recendy been integrated into desktop and portable personal computers and workstations 
5 (hereinafter generically referred to as "workstations'*). These capabilities have been used 
primarily in desktop multimedia authoring systems for producing CD-ROM-based works. 
While such systems are capable of processing, combining, and recording audio, video 
and data locally (i.e., at die desktop), they do not adequately support networked 
collaborative environments, principally due to the substantial bandwidth requirements for 

10 real-time transmission of high-quality, digitized audio and full-motion video which 
preclude conventional LANs from sii|^qrting more dian a few workstations. Thus, 
although currently available desktop multimedia computers frequendy include 
videoconferencing and other multimedia or collaborative capabilities within dieir 
advertised feature set (see, e.g., A. Reinhardt, "Video Conquers die Desktop", BYTE, 

15 September 1993. pp. 64-90), such systems have not yet solved die many problems 
inherent in any practical implementation of a scalable collaboration system. 

SUMMARY OF THE INVENTION 

The present invention in its various aspects is defined in die independent 
claims appended to diis description. Advantageous feamres are set foith in the 

20 appendant claims. 

A preferred embodiment of the present invention is described in detail 
below with reference to the drawings. In this embodiment computer hardware, software 
and communications technologies arc combined in novel ways to produce a multimedia 
collaboration system that greatiy facilitates distributed collaboration, in part by 

25 replicating the benefits of face-to-face collaboration. The system tightiy integrates a 

carefully selected set of multimedia and collaborative c^abilities, principal among which 
are desktop teleconferencing and multimedia mail. 

As used herein, desktop teleconferencing includes real-time audio and/or 
video teleconferencing, as well as data conferencing. Data conferenciiig, in turn, 

30 includes snapshot sharing (sharing of '^snapshots'* of selected regions of the user's 

screen), application sharing (shared control of running applications), shared whiteboard 
(equivalent to sharing a •'blank" window), and associated telepointing and annotation 
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capabawes. Teleconferences may be recorded and stored for later playback, including 
both audio/video (A/V) and all data interactions. 

While desktop teleconferencing supports real-time interactions, 
multimedia mail permits the asynchronous exchange of arbitrary multimedia documents, 
including -custom-authored* messages and previously recorded teleconferences. Indeed, 
it is to be understood that the multimedia capabilities underlying desktop teleconferencing 
and multimedia maU also greaUy facUitate die creation, viewing, and manipulation of 
high-quality multimedia documents in general, including animations and visualizations 
diat might be developed, for example, in die course of information analysis and 
modelling. Further, diese animations and visualizations may be generated for individual 
ratiier than collaborative use. such that die present invention has utility beyond a 

collaboration context. 

The system provides for a collaborative multimedia workstation (CMW) 

system wherein very high-quality audio and video capabUities can be readUy 
superimposed onto an enterprise's existing computing and network infrastrucmre. 
including workstations. LANs. WANs, and building wiring. 

In die preferred embodiment, the system architecmre employs separate 
real-ume and asynchronous networks - die former for real-time audio and video, and 
tbc latter for non-real-time audio and video, text, graphics and oUier data, as weU as 
control signals. These networks are interoperable across different computers (e.g.. 
Macintosh. Intel-based PCs. and Sun workstations), operating systems (e.g.. Apple 
System 7. DOS/Windows, and UNIX) and network operating systems (e.g.. Novell 
Netware and Sun ONC-I-). In many cases, both networks can actually share die same 

cabling and wall jack coimcctor. 

The system architecmre also accommodates die simation in which die 
user's desktop computing and/or communications equipment provides varying levels of 
media-handlmg capability. For example, a collaboration session - whedKr real-time or 
asynchronous - may include participants whose equipment provides capabilities ranging 
from audio only (a telephone) or data only (a personal computer wfdi a modem) to a full 
complement of real-time. high-fideUty audio arrf full-motion video, and high-speed data 

network facilities. 

The CMW system architecmre is readUy scalable to very large 
enterprise-wide network enviromnents accommodating thousands of users. Further, it is 
an open architecmre diat can accommodate appropriate standards. Finally, die CMW 
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system incorporates an inniitive, yet powerful, user interface, making the system easy to 
learn and use. 

The system thus provides a distributed multimedia collaboration 
environment that achieves the benefits of face-to-face collaboration as nearly as possible, 
5 leverages (*'snaps on to") existing computus and network infrastructure to the maximum 
extent possible, scales to very large networks consisting of thousand of workstations, 
accommodates emerging standards, and is easy to learn and use. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The preferred embodiment of the invention will now be described in 
10 detail, by way of example, with reference to the drawings, in which: 

Figure 1 is a (tiagrammatic representation of a multimedia collaboration 

system. 

Figures 2A and 2B are representadons of a computer screen illustrating, 
to the extent possible in a still image, the full-motion video and related user interface 
15 displays which may be generated during operation of the system of Figure 1. 

Figure 3 is a block and schematic diagram of a ''multimedia local area 

networic" (MLAN)- 

Fjgure 4 is a block and schematic diagram illustrating how a plurality of 
geographically dispersed MLANs of the type shown in Figure 3 can be connected via a 
20 wide area network. 

Figure 5 is a schematic diagram illustrating how collaboration sites at 
distant locations L1-L8 are conventionally interconnected over a wide area network by 
individually connecting each site to every other site. 

Figure 6 is a schematic diagram illustrating how collaboration sites at 
25 distant locations L1-L8 are interconnected over a wide area network using a 
multi-hopping approach. 

Figure 7 is a block diagram Ulustrating video mosaicing circuitry 
provided in the MLAN of Figure 3. 

Figures 8A, 8B and 8C illustrate the video window on a typical 
30 computer screen which may be generated during operation of the system, and which 
conuins only the callee for two-party calls (8A) and a video mosaic of all participants, 
e.g., for four-party (8B) or eight-party (8C) conference calls. 



Figure 9 is a block diagram illustrating audio mixing circuitry provided 

in the MLAN of Figure 3. 

Figure 10 is a block diagram illustrating video cut-and-paste circuitry 

provided in the MLAN of Figure 3 . 

Figure 11 is a schematic diagram illustrating typical operation of the 

video cut-and-paste circuitry in Figure 10. 

Figures 12-17 (consisting of Figures 12A. 12B, 13A, 13B, 14A. 14B, 
15A, 15B, 16, 17A and 17B) illustrate various examples of how the present system 
provides video mosaicing. video cut-and-pasting. and audio mixing at a plurality of 
distant sites for transmission over a wide area network in order to provide, at the CMW 
of each conference participant, video images and audio captured from die other 
conference participants. 

Figures ISA and 18B illustrate two different forms of a CMW which 

may be employed in the present system. 

Figure 19 is a schematic diagram of a CMW add-on box containing 

integrated audio and video I/O circuitry. 

Figure 20 illustrates CMW integrated with standard multi-tasking 

operating system and applications software. 

Figure 21 illustrates software modules which may be provided for 
mnning on the MLAN Server in the MLAN of Figure 3 for controlling operation of the 

AV and Data Networks. 

Figure 22 iUustrates an enlarged example of -speed-dial" face icons of 
certain collaboration participants in a CoUaboration Initiator window on a typical CMW 
screen which may be generated during operation of the present system. 

Figure 23 is a diagrammatic representation of the basic operating events 
occurring in a preferred system during initiation of a two-party call. 

Figure 24 is a block and schematic diagram illustrating how physical 
connections are established in the MLAN of Figure 3 for physically connecting first and 
second workstations for a two-party videoconference caU. 

Figure 25 is a block and schematic diagram Ulustrating how physical 
connections are established in MLANs such as iUustrated in Figure 3, for a two-party call 
between a first CMW located at one site and a second CMW located at a remote site. 

Figures 26 and 27 are block and schematic diagrams illustrating how 
conference bridging is provided in the MLAN of Figure 3. 



Figure 28 diagrammadcally illustrates how a snapshot with annotations 
may be stored in a plurality of bitmaps during data sharing. 

Figure 29 is a schematic and diagrammatic illustration of the interaction 
among multimedia mail (MMM). multimedia call/conference recording (MMCR) and 
multimedia document management (MMDM) facilities. 

Figure 30 is a schematic and diagrammatic illustration of the multimedia 
document archiiecmre employed. 

Figure 31A illustrates a centralized Audio/Video Storage Server. 

Figiu^ 31B is a schematic and diagrammadc illustration of the 
interacdons between the Audio/Video Storage Server and the remainder of the CMW 
System. 

Figure 31C illusorates an alternative form of the interactions illustrated 

in Figure 3 IB. 

Figure 31D is a schemadc and diagrammadc illustration of the 
integration of MMM, MMCR and MMDM facilities. 

Figure 32 illustrates a generalized hardware unplementation of a 
scalable AudioA^ideo Storage Server. 

Figure 33 illustrates a higher throughput version of the server illustrated 
in Figure 32, using SCSI-based crosspoint switching to increase the number of possible 
slmuluneous file transfers. 

Figure 34 illustrates the resulting multimedia collaboration enviroimient 
achieved by the integration of audio/video/data teleconferencing and MMCR, MMM and 
MMDM. 

Figures 35-42 illustrate a series of CMW screens which may be 
generated during operation of the system for a typical scenario involving a remote expen 
who takes advantage of many of the feamres provided by die present systems. 



DETAILED DESCRIFnON OF THE PREFERRED 
EMBODIMENTS 



OVERALL SYSTEM ARCHITECTURE 

Referring initiaUy to Figuie 1. illusnattd therein is an overall 
diagrammatic view of a preferred teleconferencing system or multimedia collaboration 
system. As shown, each of a plurality of "multimedia local area networks" (MLANs) 10 
connects, via lines 13. a plurality of CMWs (collaborative multimedia workstations) 12-1 
to 12-10 and provides audio/video/data networking for supporting collaboration among 
CMW users. WAN 15 in mm connects multiple MLANs 10. and typically includes 
appropriate combinations of common carrier analog and digital transmission networks. 
Multiple MLANs 10 on the same physical premises may be connected via bridges/routes 
n , as shown, to WANs and one another. 

The system of Figure 1 accommodates bodi "real time" delay-sensitive 
and jitter-sensitive signals (e.g.. real-time audio and video signals) and classical 
asynchronous data (e.g.. data control signals as weU as shared texmal. graphics and odier 
medU) communication among multiple CMWs 12 regardless of their location. Although 
only ten CMWs 12 are iUustrated in Figure 1 . it will be understood diat many more 
could be provided. As also indicated in Figure 1. various odier multimedia resources 16. 
e.g.. VCRs (video cassette recorders), laserdiscs. TV feeds, etc.. are connected to 
MLANs 10 and are diereby accessible by individual CMWs 12. 

CMW 12 in Figure 1 may use any of a variety of known types of 
operating system, such as Apple System 7. UNIX. DOS/Windows and OS/2. The 
CMWs can also have different types of window systems. Specific examples of a CMW 
12 ai« described heieinafter in connection with Figures ISA and 18B. Note that the 
system aUows for a mix of operating systems and window systems across individual 
CMWs. 

CMW 12 provides real-time audio/video/data capabilities along with the 
usual data processing capabUities provided by its operating system. For example. Fig. 
2A illustrates a CMW screen containing live, fiill-modon video of three conference 
participants, while Figuie 2B fllustratcs data shared and annouted by those conferees 
(lower left window). CMW 12 provides for bidirectional communicaiion. via lines 13. 
within MLAN 10. for audio/video signals as well as data signals. Audio/video signals 
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iransmitied from a CMW 12 typically comprise a high-quality live video image and audio 
of the CMW operator. These signals are obtained from a video camera and microphone 
provided at the CMW (via an add-on unit or partially or totally integrated into the 
CMW). processed, and dien made available to low-cost network transmission 
5 subsystems. 

Audio/video signals received by a CMW 12 from MLAN 10 may 
typically include: video images of one or more conference participants and associated 
audio, video and audio from multimedia mail, previously recorded audio/video from 
previous calls and conferences, and standard broadcast television (e.g., CNN). Received 

10 video signals are displayed on the CMW screen or on an adjacent monitor, and the 
accompanying audio is rq)roduccd by a speaker provided in or near the CMW. In 
general, the required transducers and signal processing hardware could be integrated into 
the CMW, or be provided via a CMW add-on unit, as appropriate. 

It has been found particularly advantageous to provide the 

15 above-described video at standard NTSC-quality TV performance (i.e., 30 frames per 

second at 640x480 pixels per frame and the equivalent of 24 bits of color per pixel) with 
accompanying high-fidelity audio (typicaUy between 7 and 15 KHz). 

MULTIMEDIA LOCAL AREA NETWORK 

Referring next to Figure 3, Ulustrated therein is an MLAN 10 havmg ten 
20 CMWs (12-1.— 12-10), coupled dierein via lines 13a and 13b. MLAN 10 typicaUy 

extends over a distance from around 100 metres to several kilometres (a few hundred feet 
to a few miles), and is usually located within a building or a group of proximate 
buildings. 

Given the currem state of networking technologies, it is useful (for the 
25 sake of maintaining quality and minimizing costs) to provide separate signal paths for 
real-time audio/video and classical asynchronous data communications (including 
digitized audio and video enclosures of multimedia maU messages that are free from 
real-time delivery constraints). At the moment, analog methods for carrying real-time 
audio/video are preferred. In the fiinire, digital methods may be used. Evennially. 
30 digital audio and video signal paths may be multiplexed with the data signal path as a 
common digital stream. Another alternative is to multiplex real-time and asynchronous 
data paths together using analog multiplexing methods. For the purposes of illustration, 
however, these two signal padis are treated as using physically separate wires. 
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Furtber. as this system uses analog networking for audio and video, it also physically 
separates the real-time and asynchronous switching vehicles and. in particular, assumes 
an analog audio/video switch. In the fiimie. a common switching vehicle (e.g.. ATM) 
could be used. 

The MLAN 10 thus can be implemented using conventional technology, 
such as typical Data LAN hubs 25 and AA' Switching Circuitry 30 (as used in television 
studios and other dosed-ciicuit television nttworks). linked to die OylWs 12 via 
appropriate transceivers and unshielded twisted pair (UTP) wiring. Note in Figure 1 that 
line^ 13. which interconnect each CMW 12 widiin its respective MLAN 10. comprise 
wo sets of lines 13a and 13b. Lines 13a provide bidirectional communication of 
audio/video widiin MLAN 10. while lines 13b provide for the bidirectional 
communication of data. This separation permits conventional LANs to be used for data 
communications and a supplemental network to be used for audio/video communications. 
Although tfiis separauon is advantageous in the system illustrated, it is again to be 
understood that audio/video/data networking can also be implemented using a single pair 
of lines for both audio/video and data communications via a very wide variety of analog 
and digital multqilexing schemes. 

While lines 13a and 13b may be implemented in various ways, it is 
currently preferred to use commonly installed 4-pair UTP telephone wires, wherein one 
pair is used for incoming video with accompanying audio (mono or stereo) multiplexed 
in. wherein another pair is used for outgoing multiplexed audio/video, and wherein die 
remaining two pairs are used for carrying incoming and outgoing data in ways consistem 
with existing LANs. For example. lOBaseT Ethernet uses RJ-45 pins 1. 2. 4. and 6. 
leaving pins 3. 5. 7. and 8 available for the two A/V twisted pairs. The resulting system 
is compatible wid. standard (AT&T 258A. EL^mA 568. 8P8C. lOBaseT. ISDN, etc.) 
telephone wiring found commonly throughout telephone and LAN cable plants in most 
office bnUdings throughout the world. These UTP wires are used in a hierarchy or peer 
arrangements of star topologies to create MLAN 10. described below. Note diat the 
distance range of the data wires often must match diat of the video and audio. Various 
UTP-compatibIc dau LAN networks may be used, such as Ediemet. token ring. FDDI. 
ATM. etc. For distances longer than the maximum distance specified by the data LAN 
protocol, dau signals can be additionally processed for proper UTP operations. 

AS shown in Figure 3. lines I3a from each CMW 12 are coupled to a 
conventional Data LAN hub 25. which faciliutes the communication of dau (including 
control signals) among such CMWs. Unes 13b in Figure 3 are connected to AA^ 
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Switching Circuitry 30. One or more conference bridges 35 arc coupled to AA^ 
Switching Circuitry 30 and possibly (if needed) the Data LAN hiib 25. via lines 35b and 
35a. respectively, for providing multi-party conferencing in a particularly advantageous 
manner, as will hereinafter be described in detail. A WAN gateway 40 provides for 
5 bidirectional communication between MLAN 10 and WAN 15 in Figure 1. For this 
purpose. Data LAN hub 25 and A/V Switching Circuitry 30 are coupled to WAN 
gateway 40 via outputs 25a and 30a, respectively. Other devices connect to the A/V 
Switching Circuitry 30 and Data LAN hub 25 to add additional features (such as 
multimedia mail, conference recording, etc.) as discussed below. 

10 Control of A/V Switching Circuitry 30, conference bridges 35 and 

WAN gateway 40 in Figure 3 is provided by MLAN Server 60 via lines 60b, 60c, and 
60d, respectively. In one example, MLAN Server 60 supports the TCP/IP network 
protocol suite. Accordingly, software processes on CMWs 12 communicate with one 
another and MLAN Server 60 via MLAN 10 using these protocols. Other network 

15 protocols could also be used, such as IPX. The manner in which software running on 
MLAN Server 60 connrols die operation of MLAN 10 will be described in detail 
hereinafter. 

Note in Figure 3 that Data LAN hub 25, A/V Switching Circuitry 30 
and MLAN Server 60 also provide respecdve lines 25b. 30b, and 60e for coupling to 

20 additional multinnedia resources 16 (Figure 1), such as multimedia document 

management, muldmedia databases, radio/TV channels, etc. Data LAN hub 25 (via 
bridges/routers 11 in Figure 1) and A/V Switching Circuitry 30 additionally provide lines 
25c and 30c for coupling to one or more other MLANs 10 which may be in the same 
locality (i.e., not far enough away to require use of WAN technology). Where WANs 

25 are required, WAN gateways 40 are used to provide highest quality compression methods 
and standards in a shared resource fashion, thus minimizing costs at the workstation for a 
given WAN quality level, as discussed below. 

The basic operation of the resulting collaboration system shown in 
Figures 1 and 3 will next be considered. Important features of the present system reside 

30 in providing not only multi-party real-time desktop audio/video/data teleconferencing 
among geographically distributed CMWs, but also in providing from the same desktop 
audio/video/data/iext/graphics mail capabilities, as well as access to other resources, 
such as databases, audio and video files, overview cameras, standard TV channels, etc. 
Fig. 2B illustrates a CMW screen showing a multimedia EMAIL mailbox (top left 
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window) comaining references to a number of received messages along with a video 
enclosure (top right window) to the selected message. 

Returning to Figures 1 and 3. A/V Switching Circuitry 30 (whether 
digital or analog) provides common audio/video switching for CMWs 12. conference 
bridges 35. WAN gateway 40 and multimedia resources 16. as determined by MLAN 
Server 60. which in nira controls conference bridges 35 and WAN gateway 40. 
SimUarly. asynchronous data is coimnunicated within MLAN 10 utilizing conunon data 
communications formats where possible (e.g., for snapshot sharing) so that the system 
can handle such data in a common manner, regardless of origin, thereby faciliuting 
multimedia mail and dau sharing as weU as audio/video communications. 

For example, to provide multi-party teleconferencing, an initiating 
CMW 12 signals MLAN Server 60 via Data LAN hub 25 identifying the desired 
conference participants. After determining which of these conferees will accept die call. 
MLAN Server 60 controb fiJW Switching Circuiny 30 (and CMW software via die data 
network) to set up die required audio/video and data paths to conferees at die same 

location as the initiating CMW. 

When one at more conferees are at distant locadons. the respective 
MLAN Servers 60 of die involved MLANs 10. on a peer-to-peer basis, control d>eir 
respective A/V Switching Circuitry 30. conference bridges 35. and WAN gateways 40 to 
set up appropriate communication padis (vU WAN 15 in Figure 1) as required for 
'interconnecting die conferees. MLAN Servers 60 also communicate widi one anodier via 
dau padis so diat each MLAN 10 conuins updated infonnation as to die capabiUties of 
aU of die system CMWs 12. and also die current locations of aU parties available for 

teleconferencing. 

The data conferencing component of die aboveniescribed system 
supports die sharing of visual information at one or more CMWs (as described in greater 
detail below). This encompasses bodi -snapshot sharing" (sharing "snapshots" of 
complete or partial screens, or of one or more selected windows) and "application 
sharing" (sharing bofli die control and display of running applications). When 
transferring images, lossless or slighdy lossy image compression can be used to reduce 
network bandwiddi requirements and user-perceived delay while maintaining high image 
quality. 

In all cases, any participant can point at or annotate the shared data. 
These associated tdepointers and annotations appear on every participant s CMW screen 
as diey are drawn (i.e.. effectively in real time). For example, note Figure 2B which 
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illuscraces a typical CMW screen during a multi-pany teleconferencing session, wherein 
the screen contains annotated shared data as well as video images of the conferees. As 
described in greater detail below, all or portions of the audio/video and data of the 
teleconference can be recorded at a CMW (or within MLAN 10), complete with all the 
data interactions. 

In the above-described arrangement, audio/video file services can be 
implemented either at the individual CMWs 12 or by employing a centralized 
audio/video storage server. This is one example of the many types of additional servers 
that can be added to the basic system of MLANs 10. A similar approach is used for 
incorporating other multimedia services, such as commercial TV channels, multimedia 
mail, multimedia document management, multimedia conference recording, visualization 
servers, etc. (as described in greater detail below). Certainly, applications diat run 
self-contained on a CMW can be readily added, but die system extends this capability 
gready in the way tfiat MLAN 10, storage and odier functions are implemented and 
leveraged. 

In particular, standard signal formats, network interfaces, user interface 
messages, and caU models can allow virtually any multimedia resource to be smoothly 
integrated mto the system. Factors facilitating such smooth integration include: (i) a 
conmion mechanism for user access across the network; (ii) a common meuphor (e.g., 
placing a call) for the user to initiate use of such resource; (iii) the ability for one 
function (e.g., a multimedia conference or multimedia database) to access and exchange 
informadon with another function (e.g., multimedia mail); and (iv) the ability to extend 
such access of one networked function by another networked function to relatively 
complex nestings of simpler functions (for example, record a multimedia conference in 
which a group of users has accessed multimedia mail messages and transferred them to a 
multimedia database, and then send pan of the conference recording just created as a new 
multimedia mail message, utilizing a multimedia mail editor if necessary). 

A sioD^le example of the smooth integration of functions made possible 
by the above-described approach is that the GUI (graphical user interface) and software 
used for snapshot sharing (described below) can also be used as an input/output interface 
for multimedia mail and more general forms of multimedia documents. This can be 
accomplished by structuring the interprocess communication protocols to be uniform 
across all these applications. More complicated examples — specifically multimedia 
conference recording, multimedia mail and multimedia document management — will be 
presented in detail below. 
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WIDE AREA NETWORK 

Next to be described in connection with Figure 4 is the advantageous 
manner in which the present system provides for real-time audio/video/data 
communication among geographicaUy dispersed MLANs 10 via WAN 15 (Figure I), 
whereby communication delays, cost and degradation of video quality are significantly 
minimized from what would otherwise be expected. 

Four MLANs 10 are illustrated at locations A. B. C and D. CMWs 
12-1 to 12-10. AA^ Switching Circuitry 30, Data LAN hub 25. and WAN gateway 40 at 
each location correspond to those shown in Figures 1 and 3. Each WAN gateway 40 in 
Figure 4 wiU be seen to comprise a router/codec (R&C) bank 42 coupled to WAN 15 via 
WAN switching multiplexer 44. The router is used for data intereomiection and the 
codec is used for audio/video interconnection (for multimedia liuul and docuroem 
ttansmission, as wcU as videoconferencing). Codecs from multiple vendors, or 
supporting various compression algorithms may be employed. As shown, the router and 
codec are combined with the switching multiplexer to form a single integrated unit. 

Typically. WAN 15 is comprised of Tl or ISDN 
common-carrier-provided digital links (switched or dedicated), in which case WAN 
switching multiplexers 44 are of the appropriate type (Tl . ISDN, fractional Tl. T3. 
switched 56 Kbps. etc.). Note that the WAN switching multiplexer 44 typicaUy creates 
subchannels whose bandwidth is a multiple of 64 Kbps (i.e.. 256 Kbps. 384. 768. etc.) 
among the Tl, T3 or ISDN carriers. Inverse multiplexers may be required when using 
56 Kbps dedicated or switched services from these carriers. 

In the MLAN 10 to WAN 15 direction, router/codec bank 42 in Figure 
4 provides conventional analog-to-digital conversion and compression of audio/video 
signals received from AA^ Switching Circuitry 30 for transmission to WAN 15 via WAN 
switching multiplexer 44. along with transmission and routing of data signals received 
from Data LAN hub 25. In the WAN 15 to MLAN 10 direction, each router/codec 
bank 42 in Figure 4 provides digital-to-analog conversion and decompression of 
audio/video digital signals received from WAN 15 via WAN switching multiplexer 44 
for transmission to AA^ Switching Circuitry 30. along with Uie transmission to Data 
LAN hub 25 of data signals received from WAN 15 . 

The system also provides optimal routes for audio/video signals through 
die WAN. For example, in Figure 4. location A can take either a direct route to location 
D via path 47. or a two-hop route through location C via paths 48 and 49. If die direct 
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path 47 linking location A and location D is unavailable, the multipath route via location 
C and paths 48 and 49 could be used. 

In a more complex network, several multi-hop routes are typically 
available, in which case the roudng system handles the decision making, which for 
example can be based on network loading considerations. Note the resulting two-level 
network hierarchy: a MLAN 10 to MLAN 10 (i.e., siie-io-site) service connecting 
codecs with one another only at connection endpoints. 

The cost savings made possible by providing the above-described 
multi-hop capability (with uitermediate codec bypassing) are very significant as will 
become evidem by nodng die examples of Figures 5 and 6. Figure 5 shows that using 
die conventional "fully connected mesh** locauon-to-location approach, twenty-eight 
WAN links are required for interconnecting the eight locations LI to L8. On the other 
hand, using the above multi-hop capabilities, only nine WAN links arc required, as 
shown in Figure 6. As the number of locations increase, the difference in cost becomes 
even greater. For example, for 100 locations, the conventional approach would require 
about 5.000 WAN links, while the multi-hop approach of die present system would 
typically require 300 or fewer (possibly considerably fewer) WAN links. Although 
specific WAN links for the multi-hop approach would require higher bandwidth to carry 
the additional traffic, the cost involved is very much smaller as compared to the cost for 
the very much larger number of WAN links required by the conventional approach. 

At the endpoints of a wide-area call, the WAN switching multiplexer 
routes audio/video signals direcdy from the WAN network interface durough an available 
codec to MLAN 10 and vice versa. At intermediate hops in die netwoiic, however, video 
signals are routed from one network interface on the WAN switching multiplexer to 
another network interface. Aldiough A/V Switching Circuitry 30 could be used for diis 
purpose, die preferred system provides switching functionality inside die WAN switching 
multiplexer. By doing so, it avoids having to route audio/video signals through codecs to 
the analog switching circuitry, thereby avoiding additional codec delays at the 
intermediate locations. 

A product capable of performing the basic switching functions described 
above for WAN switching multiplexer 44 is available from Teleos Corporation, 
Eatontown, New Jersey (U.S.A.). This product is not known to have been used for 
providing audio/video multi-hopping and dynamic switching among various WAN links 
as described above. 
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In addition to the above-described multiple-hop approach, the present 
system provides a particularly advantageous way of minimizing delay, cost and 
degradation of video quality in a multi party video teleconference involving 
geographically dispersed sites, while stUl delivering full conference views of aU 
5 panicipants. Normally, in order for the CMWs at all sites to be provided with live 
audio/video of every participant in a teleconference simultaneously, each site has to 
allocate (in router/codec bank 42 in Figure 4) a separate codec for each participant, as 
wcU as a like number of WAN trunks (via WAN switching multiplexer 44 in Figure 4). 

As wiU next be described, however, the system iUustrated 
10 advanugeously permits each wide area audio/video teleconference to use only one codec 
at each site, and a minimum number of WAN digital trunks. BasicaUy . the system 
aiustiated achieves this most importam result by employing "distributed" video 
mosaicing via a video -cut-and-paste" technology along with distributed audio mixing. 

DISTRIBUTED VIDEO MOSAICING 
Figure 7 iUustratcs a preferred way of providing video mosaicing in the 
MLAN of Figure 3 - i.e.. by combining the individual analog video picnires from the 
individuals participating in a teleconference into a single analog mosaic picture. As 
shown in Figure 7. analog video signals 112-1 to 1 ILn from the participants of a 
ttlcconferencc are applied to video mosaicing circuitry 36. which is provided as part of 
20 conference bridge 35 in Figure 3. These analog video inputs 1 12-1 to 1 12-n are obtained 
from the AAT Switching Circuitry 30 (Figure 3) and may include video signals from 
CMWs at one or more distant sites (received via WAN gateway 40) as weU as from other 

CMWs at die local site. 

Video mosaicing circuitry. 36. represented by block is capable of 

25 receiving N individual analog video picture signals (where N is a squared integer, i.e.. 4. 
9. 16. etc.). Circuitry 36 first reduces the size of the N input video signals by reducing 
die resolutions of each by a factor of M (where M is the square root of N (i.e.. 2. 3, 4. 
etc.). and dwn arranging them in an M-by-M mosaic of N images. -The resulting single 
analog mosaic 36a obtained from video mosaicing circuitry 36 is then transmitted to the 

30 individual CMWs for display on the screens thereof. 

As WiU become evident hereinafter, it may be preferable to send a 
different mosaic to distant sites, in which case video mosaicing circuitry 36 would 
provide an additional mosaic 36b for this purpose. A typical displayed mosaic picture 
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(}Nf=s4, M=2) showing three participants is illustrated in Figure 2A. A mosaic 
containing four participants is shown in Figure 8B. It will be appreciated that, since a 
mosaic {36a or 36b) can be transmitted as a single video picture to an other site, via 
WAN 15 (Figures I and 4), only one codec and digital trunk are required. Of course, if 
only a single individual video picture is required to be sent from a site, it may be sent 
directly without being included in a mosaic. 

Note that for large conferences it is possible to employ multiple video 
mosaics, one for each video window supported by the CNfWs (sec. e.g.. Figure 8C). In 
very large conferences, it is also possible to display video only from a select focus group 
whose members arc selected by a dynamic "floor conarol" mechanism. Also note that, 
with additional mosaic hardware, it is possible to give each CMW its own mosaic. This 
can be used in small conferences to raise the maximum number of participants (from M- 
to -I- 1 - i.e., 5» 10. 17. etc.) or to give everyone in a large conference their own 
•'focus group" view. 

Also note that the entire video mosaicing approach described thus far 
and continued below applies should digital video transmission be used in lieu of analog 
transmission, particularly since both mosaic and video window implementations use 
digital fonnats internally and in current products are transformed to and from analog for 
external interfacing. In particular, note that mosaicing can be done digitally without 
decompression with many existing compression schemes. Further, with an all-digital 
approach, mosaicing can be done as needed direcdy on the CMW. 

Figure 9 illustrates audio mixing circuitry represented by block 38 for 
use in conjunction widi die video mosaicing circuitry 36 in Figure 7. bodi of which may 
be part of conference bridges 35 in Figure 3. As shown in Figure 9. audio signals 114-1 
to 1 14-n are applied to audio mixing or summing circuitry 38 for combination. These 
inpat audio signals 1 14-1 to 1 14-n may inchide audio signals from local panicipants as 
well as audio sums from participants at distant sites. Audio mixing circuitry 38 provides 
a respective "minus-l" sum ouq>ut 38-1, 38a-2. etc. for each participant. Thus, each 
participant hears every conference participant's audio except hisAier own. 

In the system illustrated, sums are decomposed and formed in a 
distributed fashion, creating partial sums at one site which arc completed at other sites by 
appropriate signal insertion. Accordingly, audio mixing circuitry 38 is able to provide 
one or more additional sums, such as indicated by output 38, for sending to odier sites 
having conference panicipants. 
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Next to be consideied is the manner in which video cut-and-paste 
techniques are advanugeously employed. It wUl be understood that, since video mosaics 
and/or individual video picnires may be sent from one or more other sites, the problem 
arises as to how these siwations are handled. 

Video cut-and-paste circuitry 39, as Ulusirated in Figure 10, is provided for this purpose, 
and may also be incorporated in the conference bridges 35 in Figure 3. 

Referring to Figure 10. video cut-and-paste circuitry 39 receives analog 
video inputs 1 16. which may be comprised of one or more mosaics or single video 
pictures received from one or more distant sites and a mosaic or single video picnire 
produced by the local site. It is assumed that the local video mosaicing circuitry 36 
(Figure 7) and the video cut-and-paste cireuitry 39 have the capabUity of handling all of 
d« applied individual video picmres, or at least are able to choose which ones are to be 
displayed based on existing available signals. 

The video cut-and-paste circuitry 39 digitizes the incoming analog video 
inputs 1 16. selectively rearranges the digital signals on a region-by-region basis to 
produce a single digital M-by-M mosaic, having individual picnires in selected regions, 
and then converu the resulting digital mosaic back to analog form to provide a single 
analog mosaic picture 39a for sending to local participants (and other sites where 
required) having the individual input video picnires in appropriate regions. This 
resulting cut-and-paste analog mosaic 39a will provide the same type of display as 
illustrated in Figure 8B. As will become evident hereinafter, it is sometimes beneficial to 
send different cut-and-paste mosaics to different sites, in which case video cut-and-paste 
circuitry 39 will provide additional cut-and-paste mosaics 39b.l. 39b-2. etc. for this 
purpose. 

Figure 1 1 diagrammaticaUy illustrates an example of how video 
cut-and-paste cireuitry may operate to provide the cut-and-paste analog mosaic 39a. As 
shown in Figure 11. four digitized individual signals 116a, 116b. 1 16c derived from the 
input video signals are "pasted" into selected regions of a digital frame buffer 17 to form 
a digital 2x2 mosaic, which is converted into an output analog video mosaic 39a or 39b 
in Figure 10. The required audio partial sums may be provided by audio mixing 
circuiuy 39 in Figure 9 in die same manner, replacing each cut-and-pasie video operation 
with a partial sum operation. 

Having described in connection widi Figures 7-11 how video mosaicing. 
audio mixing, video cut-and-pasting. ami distributed audio mixing may be performed, die 
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foUowing description of Figures 12-17 will illustrate how these capabilities may 
advantageously be used in combination in the context of wide-area videoconferencing. 
For these examples, the teleconference is assumed to have four participants designated as 
A, B, C and D. in which case 2x2 (quad) mosaics are employed. It is to be understood 

5 that greater numbers of participants could be provided. Also, two or more 

simultaneously occurring teleconferences could also be handled, in which case additional 
mosaicing. cut-and-pasie and audio mixing circuitry would be provided at the various 
sites along with additional WAN paths. For each example, the "A" figure illustrates the 
video mosaicing and cut-and-pasting provided, and the corresponding "B" figure (having 

10 die same figure number) illustrates the associated audio mixing provided. Note that these 
figures indicate typical delays that might be encountered for each example (with a single 
-UNFT" delay ranging from 0-450 miUiseconds. depending upon available compression 
technology). 

Figures 12A and 12B illustrate a 2-site example having two participants 
15 A and Bat Site #1 and two participants C and D at Site #2. Note that Uiis example 
requires mosaicing and cut-and-paste at both sites. 

Figures 13A and 13B Ulustrate anodier 2-site example, but having direc 
participants A, B and C at Site #1 and one participant D at Site #2. Note that this 
example requires mosaicing at both sites, but cut-and-paste only at Site #2. 
20 Figures 14A and 14B illustrate a 3-site example having participants A 

and B at Site #1, participant C at Site #2, and participant D at Site #3. At Site #1. the 
three local videos A, B and C are put into a mosaic which is sent to both Site #2 and Site 
»3, At Site #2 and Site #3, cut-and-paste is used to insert the single video (C or D) at 
that site into the empty region in die imported A. B, C mosaic, as shown. Accordingly. 
25 mosaicing is required at all three sites, and cut-and-paste is required for only Site #2 and 
Site #3. 

Figures 15A and 15B illustrate another 3-site example having participant 
A at Site #1, participant B at Site #2, and participants C and D at Site #3. Note diat 
mosaicing and cut-and-paste are required at all sites. Site n additionally has the 
30 capability to send different cut-and-paste mosaics to Sites #1 and Sites #3. Further note 
with respect to Figure 15B diat Site #2 creates minus- 1 audio mixes for Site #1 and Site 
#2, but only provides a partial audio mix (A&B) for Site These partial mixes are 
completed at Site #3 by mixing in C's signal to complete D*s mix (A+B+C) and D's 
signal to complete C's mix (A+B+D). 
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Figure 16 illustrates a 4-site example employing a star topology, having 
one participant at each site; that is. participant A is at Site #1. participant B is a Site #2. 
participant C is at Site #3. and participani D is at Site #4. An audio implementation is 
not illustrated for dus example, since standard minus-l mixing can be performed at Site 
# 1 . and die appropriate sums transmitted to die other sites. 

Figures 17A and 17B illustrate a 4-sitc example that also has only one 
participant at each site, but uses a line topology radicr than a star topology as in die 
example of Figure 16. Note diat diis example requires mosaicing and cut-and-paste at all 
sites. Also note Uat Site #2 and Site #3 are each required to transmit two different types 

of cut-and-paste mosaics. 

The system also provides the capabiUiy of allowing a conference 
participant to select a close-up of a participant displayed on a mosaic. This capabUity is 
provided whenever a full individual video picmre is available at tfiat user's site. In such 
case, die AA^ Switching Circuitry 30 (Figure 3) switches die selected full video picmre 
(whither obtained locaUy or from anodier site) to the CMW diat requests the close-up. 

Next to be described in connection widi Figures 18A. 18B, 19 and 20 
are various forms of a CMW. 
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COLLABORATIVE MULTIMEDL^ WORKSTATION HARDWARE 

One form of a CMW 12 is Ulustrated in Fig. 18A. Currently available 
personal computers (e.g., an Apple Macintosh or an IBM-compatible PC, desktop or 
laptop) and workstations (e.g.. a Sun SPARCstation) can be adapted to work widi ihe 
present system to provide such features as real-time videoconferencing, daca 
conferencing, multimedia mail, etc. In business situations, it can be advantageous to set 
up a laptop to operate with reduced functionality via cellular telephone links and 
removable storage media (e.g., CD-ROM, video tape with timccode suppon, etc.). but 
take on full capability back in the office via a docking station connected to the MLAN 
10. This requires a voice and data modem as yet another function server attached to the 
MLAN. 

The currendy available personal computers and workstations serve as a 
base workstation platform. The addition of certain audio and video I/O devices to the 
standard components of the base platform 100 (where standard components include the 
display monitor 200, keyboard 300 and mouse or tablet (or odier pointing device) 400). 
all of which connect widi die base platform box through standard peripheral ports 101 , 
102 and 103, enables the CMW to generate and receive real-time audio and video 
signals. These devices include a video camera 500 for capniring the user's image, 
gestures and surroundings (particularly the user's face and upper body), a nucrophone 
600 for capniring die user's spoken words (and any otficr sounds generated at die 
CMW). a speaker 700 for presenting incoming audio signals (such as the spoken words 
of another participant to a videoconferencc or audio annotations to a document), a video 
input card 130 in the base platform 100 for capmring incoming video signals (e.g., the 
image of another participant to a videoconferencc, or vidcomail), and a video display 
card 120 for displaying video and graphical output on monitor 200 (where video is 
typically displayed in a separate window). 

These peripheral audio and video I/O devices are readily available from 
a variety of vendors and are just beginning to become standard features in (and often 
physically integrated imo the monitor and/or base platform oO certain personal 
computers and workstations. SSS.* e. g.. the aforementioned BYTE article ("Video 
Conquers die Desktop"), which describes current models of Apple's Macintosh AV 
series personal computers and Silicon Graphics' Indy workstations. 

Add-on box 800 (shown in Fig. 18A and illustrated in greater deiaU in 
Fig. 19) integrates diese audio and video I/O devices widi additional functions (such as 
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adaptive echo cancelling and signal switching) and interfaces with AV Network 901 . AV 
Network 901 is the part of the MLAN 10 which carries bidirectional audio and video 
signals among the CMWs and AA' Switching Circuitry 30 - e.g.. utilizing existing UTP 
wiring to carry audio and video signals (digital or analog). 

The AV network 901 is separate and distinct from the Data Network 902 
portion of the MLAN 10. which carries bidirectional dau signaU among the CMWs and 
the Dau LAN hub (e.g.. an Ethernet network that also uiUizes UTP wiring widi a 
network interface card 1 10 in each CMW). Note diat each CMW will typically be a 
node on both the AV and the Data Networks. 

There are several approaches to implementing Add-on box 800. In a 
typical videoconfcrence. video camera 500 and microphone 600 capnire and transmit 
outgoing video and audio signals into ports 801 and 802. respectively, of Add-on box 
800. These signals are transmitted via AudioA^ideo I/O port 805 across AV Network 
901 . Incoming video and audio signals (from another videoconfcrence participant) are 
received across AV network 901 through AudioA^ideo I/O port 805. The video signals 
are sent out of V-OUT port 803 of CMW add-on box 800 to video input card 130 of base 
platform 100. where they are displayed (typicaUy in a separate video window) on 
monitor 200 utilizing the standard base platform video display card 120. The audio 
signals arc sent out of A-OUT port 804 of CMW add-on box 800 and played duough 
20 speaker 700 whUe the video signals are displayed on monitor 200. The same signal How 
occurs for other non-teleconferencing applications of audio and video. 

Add-on box 800 can be controlled by CMW software (illustrated in Fig. 
20) executed by base platform 100. Control signals can be communicated between base 
platform port 104 and Add-on box Control port 806 (e.g.. an RS-232, Centronics. SCSI 
25 or other standard coimnuiucations pon). 

Many otiier configurations of the CMW iUustrated in Fig. ISA wiU 
work in die present system. For example. Add-on box 800 itself can be implemented as 
an add-in card to the base plarfbrm 100. Connections to the audio and video I/O devices 
need not change, though the connection for base platform control can be implemented 
30 internally (e.g.. via the system bus) rather than dirough an external RS-232 or SCSI 

peripheral port. Various additional levels of integration can also be achieved as will be 
evident to those skilled in the art. For example, microphones, speakers, video cameras 
and UTP transceivers can be integrated into Uie base pUitform 100 itself, and aU media 
handling technology and communications can be inttgrated onto a single card. 
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A handset/headset jack enables the use of an integrated audio I/O device 
as an alternate to the separate microphone and speaker. A telephone interface could be 
integrated into add-on box 800 as a local implementation of computer-integrated 
telephony. A -hoW (i.e.. audio and video mute) switch and/or a separate audio mute 
switch could be added to Add-on box 800 if such an implementation were deemed 
preferable to a software-based interface. 

The internals of Add-on box 800 of Fig. 18A are illustrated in Fig. 19. 
Video signals generated at the CMW (e.g., capnired by camera 500 of Fig. 18A) are sent 
to CMW add-on box 800 via V-IN port 801 . They then typically pass unaffected through 
Loopback/AV Mute circuitiy 830 via video porU 833 (input) and 834 (output) and into 
AA^ Transceivers 840 (via Video In port 842) where they are transformed from standard 
video cable signals to UTP signals and sent out via port 845 and Audio/Video I/O port 
805 onto AV Network 901 . 

The Loopback/AV Mute circuitry 830 can. however, be placed in 
various modes under software control via Control port 806 (implement, for example, 
as a standard UART). If in loopback mode (e.g., for testing incoming and outgoing 
signals at the CMW), die video signals would be routed back out V-OUT port 803 via 
video port 831. If in a mute mode (e.g., muting audio, video or both), video signals 
might, for example, be disconnected and no video signal would be sent out video pon 
834. Loopback and muting switching functionality is also provided for audio in a similar 
way. Note that computer control of loopback is very useful for remote testing and 
diagnostics while manual override of computer control on mute is effective for assured 
privacy from use of the workstation for electronic spying. 

Video input (e.g., captured by the video camera at the CMW of another 
videoconfcrence participant) is handled in a similar fashion. It is received along AV 
Network 901 through Audio/Video I/O port 805 and port 845 of A/V Transceivers 840. 
where it is sent out Video Out port 841 to video port 832 of Loopback/AV Mute circuiury 
830, which ^ically passes such signals out video port 831 to V-OUT pon 803 (for 
receipt by a video input card or other display mechanism, such as LCD display 810 of 
CMW Side Mount unit 850 in Fig. 18B. to be discussed). 

Audio input and output (e.g., for playback through speaker 700 and 
capnire by microphone 600 of Fig. 18 A) passes through A/V transceivers 840 (via Audio 
In port 844 and Audio Out port 843) and Loopback/AV Mute circuitry 830 (Uirough 
audio ports 837/838 and 836/835) in a simUar manner. The audio mpm and output ports 
of Add-on box 800 interface with standard amplifier and equalization circuitry, as well as 
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an adaptive room echo canceUer 814 to eliminate echo, minimiie feedback and provide 
enhanced audio pcrfonnance when using a separate microphone and speaker. In 
particular, use of adaptive room echo cancellers provides high-quality audio interactions 
in wide area conferences. Because adaptive room echo cancelling requires training 
periods (typicaUy involving an objectionable blast of high-amplinidc white noise or tone 
sequences) for aHgnment with each acoustic environment, it is preferred that separate 
echo cancelling be dedicated to each workstauon radier dian sharing a smaller group of 
echo cancellers across a larger group of workstations. 

Audio inputs passing through audio port 835 of Loopback/AV Mute 
circuitry 830 provide audio signals to a speaker (via standard Echo Canceller circuitry 
814 and A-OUT port 804) or to a handset or headset (via I/O poixs 807 and 808. 
respectively, under volume control circuitry 815 controlled by software through Control 
port 806). In aU cases, incoming audio signals pass dirough power amplifier circuitry 
812 befoie being sent out of Add-on box 800 to the appropriate audio-emitting 
transducer. 

Outgoing audio signals generated at the CMW (e.g.. by microphone 600 
of Fig. 18A or the mouthpiece of a handset or headset) enter Add-on box 800 via A-IN 
port 802 (for a microphone) or Handset or Headset I/O pons 807 and 808. respectively. 
In all cases, outgoing audio signals pass through standard preamplifier (81 1) and 
equalization (813) circuitry, whereupon the desired signal is selected by standard 
-Select- switching circuitry 816 (under software control through Control port 806) and 
passed to audio port 837 of Lo<vback/AV Mute circuitry 830. 

It is to be understood that AA' Transceivers 840 may include 
muxing/demuxing (multiplexing/ demultiplexing) facilities so as to enable the 
transmission of audio/video signals on a single pair of wires, e.g.. by encoding audio 
signals digitaUy in the vertical retrace interval of the analog video signal. 
Implementation of other audio and video enhancements, such as stereo audio and external 
audio/video I/O ports (e.g.. for recording signals generated at the CMW). are also weU 
within the capabUities of one skilled in the an. If stereo audio is used in teleconferencing 
(i.e.. to create useful spatial metaphors for users), a second echo canceller may be 
recommended. 

Another configuration of the CMW. illustrated in Fig. 18B, utilizes a 
separate (fully self-contained) "Side Mount" approach which includes its own dedicated 
video display. This airangement is advanugeous in a variety of simations. such as 
instances in which additional screen display area is desired (e.g.. in a laptop computer or 
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desktop system with a small monitor) or where it is impossible or undesirable to retrofit 
older, existing or specialized desktop computers for audio/video support. In this 
example, video camera 500, microphone 600 and speaker 700 of Fig. 18A are integrated 
together with the functionality of Add-on box 800. Side Mount 850 eliminates the 
necessity of external connections to these integrated audio and video I/O devices, and 
includes an LCD display 810 for displaying the incoming video signal (which thus 
eliminates the need for a base platform video input 
card 130). 

Given the proximity of Side Mount device 850 to the user, and the direct 
access to audio/video I/O within that device, various additional controls 820 can be 
provided at the user's touch (all well within the capabilities of those skilled in the art). 
Note that, with enough additions. Side Mount unit 850 can become virtually a standalone 
device that does not require a separate computer for services using only audio and video. 
This also provides a way of supplementing a network of ftiU-featurc workstations with a 
few low^cosi additional ''audio video intercoms" for certain sectors of an enterprise (such 
as clerical, reception, factory floor, etc.)- 

A portable laptop implementation can be made to deliver multimedia 
mail with video, audio and synchronized annotations via CD-ROM or an add-on 
videotape unit wirfi separate video, audio and time code tracks (a stereo videotape player 
can use the second audio channel for time code signals). Videotapes or CD-ROMs can 
be created in main offices and express mailed, thus avoiding the need for high-bandwidth 
networking when on the road. Cellular phone links can be used to obtain both voice and 
data communications (via modems). Modem-based data communications are sufficiem to 
support remote control of mail or presentation playback, annotation, file transfer and fax 
features. The laptop can then be brought into the office and attached to a docking station 
where the available MLAN 10 and additional lunctions adapted from Add-on box 800 
can be supplied, providing full CMW capability. 
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COLLABORATIVE MULTIMEDIA WORKSTATION SOFTWARE 

CMW software modules 160 are illustrated generally in Fig. 20 and 
discussed in greater deuil below in conjunction widi the software running on MLAN 
Server 60 of Fig. 3. Software 160 allows the user to initiate and manage (in conjunction 
with the server software) videoconferencing, data conferencing, multimedia mail and 
other collaborative sessions with other users across the network. 

Also present on the CMW are standard multi-tasking operating 
system/GUI software 180 (e.g., Apple Macintosh System 7, Microsoft Windows 3.1, or 
UNIX widi the "X Window System" and Motif or other GUI "window manager" 
software) as well as other applications 170. such as word processing and spreadsheet 
programs. Software modules 161-168 communicate with operating system/GUI software 
180 and odier applications 170 utilizing standard ftmcuon calls and interapplication 
protocols. 

The central component of the Collaborative Multimedia Workstation 
software is the Collaboration Initiator 161. All coUaborative ftmctions can be accessed 
through this module. When the Collaboration Initiator is started, it exchanges initial 
configuration information witii die Audio Video Network Manager (AVNM) 60 (shown 
in Fig. 3) through Data Network 902. Information is also sent from the Collaboration 
Initiator to die AVNM indicating the location of the user, the types of services available 
on diat workstation (e.g., videoconferencing, data conferencing, telephony, etc.) and 
other relevant initialization information. 

The Collaboration Initiator presents a user interface that allows the user 
to initiate collaborative sessions (bodi real-time and asynchronous). Session participants 
can be selected from a graphical ^rolodex* 163 dial contains a scrollable list of user 
names or from a list of quick-dial buttons 162. Quick-dial buttons show the face icons 
for the users they represent. The icon representing the user is rcffieved by the 
Collaboration Initiator from the Directory Server 66 on MLAN Server 60 when it starts 
up. Users can dynamicaUy add new quick-dial buttons by dragging the corresponding 
entries from die graphical rolodex onto the quick-dial panel. 

Once the user elects to initiate a collaborative session, he or she selects 
one or more desired participants by, for example, clicking on tiiat name to select die 
desired participant from die system rolodex or a personal rolodex. or by clicking on die 
quick-dial button or icon for that participant (see, e.g.. Fig. 2A). In eidier case, die user 
dien selects die desired session type - e.g., by clicking on a CALL button to initiate a 
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videoconference call, a SHARE bunon to initiate the sharing of a snapshot image or 
blank whiteboard, or a MAIL button to send mail. Alternatively, the user can 
double-click on the rolodex name or a face icon to initiate the default session type — 
e.g.. an audio/video conference call. 
5 The system also allows sessions to be invoked from the keyboard. It 

provides a graphical editor to bind combinauons of participants and session types to 
certain hot keys. Pressing this hot key (possibly in conjunction with a modifier key, 
e.g., < Shift > or <Ctr1>) will cause the Collaboration Initiator to start a session of the 
specified type with the given participants. 

10 Once the user selects the desired panicipant and session type. 

Collaboration Initiator module 161 retrieves necessary addressing information from 
Directory Service 66 (sec Fig. 21). In the case of a videoconference call, the 
Collaboration Initiator (or, in another arrangement, VideoPhone module 169) then * 
communicates with the audio video network manager AVNM (as described in greater 

15 detail below) to set up the necessary data structures and manage the various states of that 
call, and to control A/V Switching Circuitry 30. which selects the appropriate audio and 
video signals to be transmitted to/from each participant's CMW, In the case of a data 
conferencing session, the Collaboration Initiator locates, via the AVNM. the 
Collaboration Initiator modules at die CMWs of the chosen recipients, and sends a 

20 message causing the Collaboration Initiator modules to invoke the Snapshot Sharing 
modules 164 at each participant's CMW. Subsequem videoconferencing and data 
conferencing functionality is discussed in greater detail below in the context of particular 
usage scenarios. 

As indicated previously, additional collaborative services — such as 
25 Mail 165, Application Sharing 166, Computer-Integrated Telephony 167 and Computer 
Integrated Fax 168 — are also available from the CMW by utilizing Collaboration 
Initiator module 161 to initiate die session (i.e., to contact the participants) and to invoke 
the appropriate application necessary to manage the collaborative session. When 
initiating asynchronous collaboration (e.g., mail, fax, etc.), the Collaboration Initiator 
30 contacts Directory Service 66 for address information (e.g., EMAIL address, fax 

number, etc.) for the selected participants and invokes the appropriate collaboration tools 
with the obtained address information. For real-time sessions, the Collaboration Initiator 
queries the Service Server module 69 inside AVNM 63 for the current location of the 
specified participants. Using this location information, it communicates (via the AVNM) 
35 with the Collaboration Initiators of the other session participants to coordinate session 
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seiiip. AS a result, the various CoUaboration Initiators wUl invoke modules 166. 167 or 
168 (including activating any necessary devices such as the connection between the 
telephone and ihe CMW's audio I/O pon). Further details on multimedia mail are 
provided below. 

MLAN SERVER SOFTWARE 
Figure 21 diagrammatically iUustrates software 62 comprised of various 
modules (as discussed above) provided for running on MLAN Server 60 (Figure 3). It is 
,o be understood that additional software modules could also be provided. It is also to be 
understood that, although the software Ulustrated >n Figure 21 offers various significant 
w advantages, as wiU become evident hereinafter, different forms and arrangements of 
software may also be employed. The software can also be inq)lemented in various 

sub-parts rurming as separate processes. 

Clients (e.g.. software-controlling workstations. VCRs. laserdisks. 
multimedia resources, etc.) communicate with the MLAN Server Software Modules 62 
using the TCP/IP network protocols. GeneraUy. the AVNM 63 cooperates with the 
Service Server 69. Conference Bridge Manager (CBM 64 in Figure 21) and the WAN 
Network Manager (WNM 65 in Figure 21) to manage communications within and among 
both MLANs 10 and WANs 15 (Figures I and 3). 

The AVNM additionally cooperates with Audio/Video Storage Server 67 
and other multimedia services 68 in Figure 21 to support various types of collaborative 
interactions as described herein. CBM 64 in Figure 21 operates as a client of the AVNM 
63 to manage conferencing by controUing die operation of conference bridges 35. This 
includes management of the video mosaicing circuitry 37. audio mixing circuitry 38 and 
cut-and-paste circuitry 39 preferably incorporated dierein. WNM 65 manages the 
allocation of paths (codecs and trunks) provided by WAN gau^way 40 for accomplishing 
the communications to other sites caUed for by the AVNM. 

Audio Video Network Manager 
The AVNM 63 manages AJV Switching Circuitry 30 in Figure 3 for 
selectively routing audio/video signals to and from CMWs 12. and also to and from 
WAN gateway 40. as called for by clients. Audio/video devices (e.g.. CMWs 12. 
conference bridges 35. multimedia resources 16 and WAN gateway 40 in Figure 3) 
comiected to AJW Switching Circuito' 30 in Figure 3. have physical comiecdons for 
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audio in, audio out, video in and video out. For each device on the network, the AVNM 
combines these four connections into a port abstraction, wherein each port represenis an 
addressable bidirectional audio/video chaimel. Each device connected to the network has 
at least one port. Different ports may share the same physical connections on the switch. 
5 For example, a conference bridge may typically have four ports (for 2x2 mosaicing) that 
share the same video-out connection. Not all devices need both video and audio 
connections at a port. For example, a TV tuner port needs only incoming audio/video 
connections. 

In response to client program requests, the AVNM provides connectivity 
10 between audio/video devices by connecting their ports. Connecting ports is achieved by 
switching one port's physical input connections to die other port's physical output 
connections (for both audio and video) and vice-versa. Client programs can specify 
which of die 4 physical connections on its ports should be switched. This allows client 
programs to establish unidirectional calls (e.g., by specifying that only the port's input 
15 connections should be switched and not the port's output connections) and audio-only or 
video-only calls (by specifying audio coxmecdons only or video coimections only). 

Service Server 

Before client programs can access audio/video resources through the 
AVNM, they must register the collaborative services they provide with the Service 

20 Server 69. Examples of these services indicate "video call", "snapshot sharing", 
"conference" and "video file sharing". These service records are entered into the 
Service Server's service database. The service database thus keeps track of the location 
of client programs and the types of collaborative sessions in which they can participate. 
This allows the Collaboration Initiator to find collaboration participants no matter where 

25 they are located. The service database is replicated by all Service Servers: Service 

Servers conununicate with other Service Servers in other MLANs throughout the system 
CO exchange their service records. 

Clients may create a plurality of services, depending on the collaborative 
capabilities desired. When creating a service, a client can specify the network resources 

30 (e.g. ports) that will be used by this service. In particular, service information is used to 
associate a user with the audio/video ports physically connected to the particular CMW 
into which the user is logged in. Clients that want to receive requests do so by putting 
their services in listening mode. If clients want to accept incoming data shares, but warn 
to block incoming video calls, they must create different services. 



-30- 

A client can create an exclusive service on a set of ports to prevent other 
clients from creating services on these ports. This is useful, for example, to prevent 
multiple conference bridges from managing the same set of conference bridge ports. . 

Next to be considered is the prefetred manner in which the AVNM 63 
5 (Figure 21). in cooperation with die Service Server 69. CBM 64 and participating 

CMWs provide for managing AJV Switching Circuitry 30 and conference bridges 35 in 
Figure 3 during audio/video/data teleconferencing. The participating CMWs may 
include workstations located at bodi local and remote sites. 

BASIC TWO-PARTY VIDEOCONFERENCING 

jQ As previously described, a CMW includes a CoUaboration Initiator 

software module 161. (see Fig. 20) which is used to establish person-to-person and 
multiparty calls. The corresponding collaboration initiator window advantageously 
provides quick-dial face icons of frequendy dialled persons, as illustrated, for example, 
in Figure 22. which is an enlarged view of typical face icons along with various initiating 

13 buttons (described in greater detail below in connection with Figs. 35-42). 

Videoconference calls can be initiated, for example, merely by 
double-clicking on these icons. When a call is initiated, the CMW typically provides a 
screen display diat includes a live video picnire of the remote conference participant, as 
iUusiratcd for example in Figure 8A. This display also includes control buttons/menu 

20 items dutt can be used to place the remote participant on hold, to resume a call on hold, 
to add one or more participants to the call, to initiate data sharing and to hang up the 
call. 

The basic underlying software-conin>Ued operations occurring for a 
two-party call are diagrammaucally iUustrated in Figure 23. After logging to AVNM 63. 
25 as indicated by (1) in Figure 23. a caller initiates a caU (e.g.. by selecting a user from the 
graphical n)lodex and clicking die caU button or by double-clicking die face icon of the 
callec on die quick-dial panel). THe caUer's CoUaboration Initiator responds by 
idenliiying the selected user and requesting diai user's address from Directory Service 
66. as indicated by (2) in Figure 23. Directory Service 66 looks up the callee's address 
30 in die directoiy database, as indicated by (3) in Figure 23. and then remms it to the 
caller's Collaboration Initiator, as illustrated by (4) in Figure 23. 

The caller's Collaboration Initiator sends a request to die AVNM to 
place a video call to die caller widi die specified address, as indicated by (5) in Figure 
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23. The AVNM queries the Service Server to find the service instance of type •* video 
cair whose name corresponds to die callec's address. This service record ideniifies the 
location of the callee*s Collaboration Initiator as well as the network ports that the cailee 
is connected to. If no service instance is found for the cailee, the AVNM notifies the 
5 caller thai the cailee is not logged in. If the cailee is local, the AVNM sends a call event 
to the callee's Collaboration Initiator, as indicated by (6) in Figure 23. If the cailee is at 
a remote site, the AVNM forwards die call request (5) through the WAN gateway 40 for 
transmission, via WAN 15 (Figure I) to the Collaboradon Initiator of the callee*s CMW 
at the remote site. 

jQ The callee*s Collaboration Initiator can respond to the call event in a 

variety of ways. A user-selectable sound is generated to announce the incoming call. 
The Collaboration Initiator can then act in one of two modes. In "Telephone Mode**, the 
Collaboration Initiator displays an invitation message on the CMW screen that contains 
the name of the caller and buttons to accept or refuse the call. The Collaboration 

15 Initiator will then accept or refuse the call, depending on which button is pressed by the 
cailee. In "Intercom Mode", die Collaboration Initiator accepts all incoming calls 
automatically, unless there is already another call active on the callee's CMW, in which 
case behavior reverts to Telephone Mode. 

The callee's Collaboration Initiator then notifies the AVNM as to 

20 whether the call will be accepted or refused. If the call is accepted, (7), the AVNM sets 
up the necessary communication paths between the caller and the cailee required to 
establish the call. The AVNM then notifies the caller's Collaboration Initiator that the 
call has been established by sending it an accept event (8). If the caller and cailee are at 
different sites, their AVNMs will coordinate in setting up die communication paths at 

25 both sites, as required by the call. 

The AVNM may provide for managing connections among CMWs and 
other multimedia resources for audio/video/data communications in various ways, in a 
manner which will next be described. 

As has been described previously, the AVNM manages the switches in 

30 the A/V Switching Circuiny 30 in Figure 3 to provide port-to-port connections in 

response to connection requests from clients. The primary data structure used by the 
AVNM for managing these connections will be referred to as a callhandle, which is 
comprised of a plurality of bits, including state bits. 

Each port-to-port coimection managed by the AVNM comprises two 

35 callhandles. one associated with each end of the connection. The callhandle at the client 
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port of the connection permits the client to manage the client s end of the connection. 
The callhandle mode bits detennine the current sute of the callhandle and which of a 
pen's four switch connections (video in. video out. audio in, audio out) are involved in a 
caU. 

AVNM clients send call requests to the AVNM whenever they want to 
initiate a caU. As pan of a call request, the client specifies the local service in which the 
call will be involved, the name of the specific port to use for the call, identifying 
information as to the callee. and the call mode. In response, the AVNM creates a 
callhandle on the caller's port. 

AH callhandles are created in the "idle" state. The AVNM then puts the 
caUer's callhandle m the "active" state. The AVNM new creates a callhandle for the 
callee and sends it a call event, which places the caUee s callhandle in the "ringing" state. 
When the callee accepts die call, its callhandle is placed in the "active" state, which 
results in a physical connection between the caUer and the callee. Each pon can have an 
arbitrary number of callhandles bound to it. but typicaUy only one of these callhandles 

can be active at the same time. 

After a caU has been set up. AVNM clients can send requests to the 
AVNM to change the state of the call, which can advantageously be accomplished by 
controlling the callhandle states. For example, during a call, a call request from another 
party could arrive. This arrival could be signalled to the user by providing an aleit 
indication in a dialog box on the user s CMW screen. The user could refiise the call by 
clicking on a refuse bunon in the dialog box, or by clicking on a "hold" button on the 
active call window to put the current call on hold and aUow the incoming caU to be 
accepted. 

The placing of die currenfly active call on hold can advantageously be 
accomplished by changing the caUer s callhandle from the active state to a "hold" sute. 
which permits the caller to answer incoming calls or initiate new calls, widiout releasing 
the previous call. Since the connection set-up to the callee will be retained, a call on 
hold can conveniendy be resumed by die caller clicking on a resume bunon on die active 
caU window, which rcmms die corresponding callhandle back to the active state. 
Typically, multiple calls can be put on hold in diis manner. As an aid in managing calls 
that arc on hold, die CMW advantageously provides a hold list display, identifying these 
on-hold calls and (optionally) die length of time that each party is on hold. A 
corresponding face icon could be used to identify each on-hold call. In addition, buttons 
could be provided in diis hold display which would allow the userio send a 
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prcprograinined message to a party on hold. For example, this message could advise the 
callee when the call will be resumed, or could state that the call is being terminated and 
will be reinitiated at a later time. 

Reference is now directed to Figure 24 which diagrammatically 
illustrates how two-party calls are connected for CMWs WS-1 and WS-2, located at the 
same MLAN 10. As shown in Figure 24, CMWs WSl and WS-2 are coupled to the 
local AA^ Switching Circuitry 30 via ports 81 and 82, respectively. As previously 
described, when CMW WS-1 calls CMW WS-2, a callhandlc is created for each port. If 
CMW WS-2 accepts the call, these two callhandles become active and in response 
thereto, the AVNM causes the AA^ Switching Circuitry 30 to set up the appropriate 
connections between pons 81 and 82, as indicated by the dashed line 83. 

Figure 25 diagrammatically illustrates how two-party calls are connected 
for CMWs WS-1 and WS-2 when located in different MLANs 10a and 10b. As 
illustrated in Figure 25. CMW WS-1 of MLAN 10a is connected to a port 9la of A/V 
Switching Circuitry 30a of MLAN 10a, while CMW WS-2 is connected to a port 91b of 
the audio/video switching circuit 30b of MLAN 10b. It will be assumed that MLANs 
10a and 10b can communicate with each other via ports 92a and 92b (through respective 
WAN gateways 40a and 40b and WAN 15). A call between CMWs WS-1 and WS-2 can 
then be established by AVNM of MLAN 10a in response to the creation of callhandles at 
ports 91a and 92a. setting up appropriate connections between these ports as indicated by 
dashed line 93a, and by AVNM of MLAN 10b, in response to callhandles created at 
pons 91b and 92b. setting up appropriate coimections between these pons as indicated by 
dashed line 93b. Appropriate paths 94a and 94b in WAN gateways 40a and 40b, 
respectively are set up by the WAN network manager 65 (Figure 21) in each network. 
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CONFERENCE CALLS 
Next to be described is the specific manner in which the system 
illustrated provides for multi-party conference calls (involving more than two 
participants). When a multi-party conference call is initiated, the CMW provides a 
screen that is simUar to the screen for two-party calls, which displays a live video picture 
of the callee s image in a video window. However, for multi-party calls, the screen 
includes a video mosaic containing a live video picture of each of the conference 
participants (including die CMW user's own picture), as shown, for example, in Figure 
8B. Of course, other arrangements could show only the remote conference participants 
(and not the local CMW user) in the conference mosaic (or show a mosaic containing 
both participants in a two-pany caU). In addition to the controls shown in Figure 8B. the 
muW-paity conference screen also includes buttons/menu items that can be used to place 
individual conference participants on hold, to remove individual participants form the 
conference, to adjourn the entire conference, or to provide a "close-up" image of a single 
individual (in place of the video mosaic) . 

Multi-party conferencing requires all the mechanisms employed for 
2-party calls. In addition, it requires the conference bridge manager CBM 64 (Figure 
21) and the conference bridges 36 (Figure 3). The CBM acts as a client of the AVNM in 
managing the operation of the conference bridges 36. The CBM also acts a server to 
other clients on the network. The CBM makes conferencing services available by 
creating service records of type "conference" in die AVNM service database and 
associating diese services with the ports on AA^ Switching Circuitry 30 for connection to 

conference bridges 36. 

The system provides two ways for initiating a conference caU. The first 
way is to add one or more parties to an existing two-party caU. For tiiis purpose, an 
ADD button is pn)vided by both the Collaboration Initiator and the Rolodex, as 
illustrated in Figures 2A and 22. To add a new party, a user selects the party to be 
added (by clicking on the user's rolodex name or fece icon as described above) and clicks 
on the ADD button to invite that new party. Additional parties can"bc invited in a similar 
manner. The second way to initiate a conference call is to select the parties in a similar 
manner and then click on the CALL buaon (also provided in the Collaboration Initiator 
and Rolodex windows on the user's CMW screen). 

Anodier alternative possibility is to initiate a conference call from the 
beginning by clicking on a CONFERENCE/MOSAIC icon/button/mcnu item on the 
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CMW screen. This could initiate a conference call with the call initiator as the sole 
participant (i.e., causing a conference bridge to be allocated such that the caller's image 
also appears on his/her own screen in a video mosaic, which will also include images of 
subsequently added participants). New participants could be invited, for example, by 
selecting each new party's face icon and then clicking on the ADD button. 

Next to be considered with reference to Figures 26 and 27 is the manner 
in which conference calls are handled. For the purposes of this description it will be 
assumed that up to four parties may participate in a conference call. Each conference 
uses four bridge ports 136-1, 136-2, 136-3 and 136-4 provided on A/V Switching 
Circuitry 30a, which are respectively coupled to bidirectional audio/video lines 36-1, 
36-2, 36-3 and 36-4 coimected to conference bridge 36. However, from this description 
it will be apparent how a conference call may be provided for additional panics, as well 
as simultaneously occurring conference calls. 

Once the Collaboration Initiator determines that a conference is to be 
initiated, it queries the AVNM for a conference service. If such a service is available, 
the Collaboration Initiator requests the associated CBM to allocate a conference bridge. 
The Collaboration Initiator then places an audio/video call to the CBM to initiate the 
conference. When the CBM accepts the call, the AVNM couples port 101 of CMW 
WS-1 to lines 36-1 of conference bridge 36 by a connection 137 produced in response to 
callhandles created for pon 101 of WS-i and bridge port 136-1. 

When the user of WS-1 selects the appropriate face icon and clicics the 
ADD buaon to invite a new participant to die conference, which will be assumed to be 
CMW WS-3, the Collaboration Initiator on WS-1 sends an add request to the CBM. In 
response, die CBM calls WS-3 via WS-3 port 103. When CBM initiates the call, the 
AVNM creates callhandles for WS-3 port 103 and bridge port 136-2. When WS-3 
accepts the call, its callhandle is made "active^, resultiug in connection 138 being 
provided to connect WS-3 and lines 136-2 of conference bridge 36. Assuming CMW 
WS-1 next adds CMW WS-5 and then CMW WS-8, callhandles for their req>ective ports 
and bridge ports 136-3 and 136-4 are created, in turn, as described above for WS-1 and 
WS-3, resulting in cotmections 139 and 140 being provided to connect WS-5 and WS-9 
to conference bridge lines 36-3 and 36-4, respectively. The conferees WS-1, WS-3, 
WS-5 and WS-8 are thus coupled to conference bridge lines 136-1, 136-2, 136-3 and 
136-4, respectively as shown in Figure 26. 

It will be understood that the video mosaicing circuitry 36 and audio 
mixing circuitry 38 incorporated in conference bridge 36 operate as previously described. 
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to form a resulting four-picture mosaic (Figure 8B) that is sent to aU of the conference 
participants, which in this example are CMWs WS-1. WS-2. WS-5 and WS-8 Users 
may leave a conference by just hanging up. which causes the AVNM to delete the 
associated callhandlcs and to send a hangup noufication to CBM. When CBM receives 
the notification, it notifies all other conference participants that the participant has exited. 
This results in a blackened portion of that participant's video mosaic image being 
displayed on the screen of all remaining participants. 

The manner in which the CBM and die conference bridge 36 operate 
when conference participants are located at differem sites wiU be evidem from the 
previously described operation of the cut-and-paste circuitiy 39 (Figure 10) with the 
video mosaicing circuitry 36 (Figure 7) and audio mixing circuitry 38 (Figure 9). In 
such case, each incoming single video picnire or mosaic from anodier site is connected to 
a respective one of the conference bridge lines 36-1 to 36^ via WAN gateway 40. 

The siniation in which a two-party call is converted to a conference call 
will next be considered in comicction widi Figure 27 and the previously considered 
2-party call Ulustrated in Figure 24. Convening this 2-paity call to a conference requires 
Uiat diis two-party call (such as illustrated between WS-1 and WS-2 in Figure 24) be 
rerouted dynamically so as to be coupled through conference bridge 36. When the user 
of WS-1 clicks on the ADD button to add a new party, (for example WS-5). the 
Collaboration Initiator of WS-1 sends a redirect request to the AVNM. which cooperates 
with die CBM to break die two-party connection 83 in Figure 24, and then redirect the 
callhandles created for pons 81 and 83 to callhandlcs created for bridge pons 136-1 and 

136-2, respectively. 

AS shown in Figure 27. tfiis results in producing a connecnon 86 
between WS-1 and bridge port 136-1, and a connection 87 between WS-2 and bridge port 
136-2. dicreby creating a conference set-up between WS-1 and WS-2. Additional 
confe^nce participants can dien be added as described above for die simations described 
above in which die conference is initiated by die user of WS-1 eidier selecting multiple 
panicipams initially or merely selecting a "conference" and tiien adding subsequem 
participants. 

Having described die preferred manner in which two-party calls and 
conference calls are set up. die preferred manner in which dau conferencing is provided 
between CMWs will next be described. 



DATA CONFERENCING 
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Dau conferencing is implemented by cenain Snapshot Sharing software 
provided at the CMW (see Figure 20). This software permits a "snapshot" of a selected 
ponion of a participant's CMW screen (such as a window) to be displayed on die CMW 
screens of other selected participants (whether or not those participants are also involved 
in a videoconference). Any number of snapshots may be shared simultaneously. Once 
displayed, any participant can then telepoint on or annotate the snapshot, which animated 
actions and results will appear (virtually simultaneously) on the screens of all other 
participants. The annotation capabilities provided include lines of several different 
widdis and text of several different sizes. Also, to facilitate participant identification, 
these annotations may be provided in a different color for each participant. Any 
annotation may also be erased by any participant. Figure 2B (lower left window) 
illustrates a CMW screen having a shared graph on which participants have drawn and 
typed to call attention to or supplement specific portions of the shared image. 

A participant may initiate data conferencing with selected participants 
(selected and added as described above for videoconference calls) by clicking on a 
SHARE bunon on the screen (available in the Rolodex or Collaboration Initiator 
windows, shown in Figure 2A, as arc CALL and ADD buttons), followed by selection of 
the window to be shared. When a participant clicks on his SHARE bunon. his 
CoUaboranon Initiator module 161 (Figure 20) queries the AVNM to locate the 
Collaboration Initiators of the selected participants, resulting in invocation of their 
respective Snapshot Sharing modules 164. The Snapshot Sharing software modules at 
the CMWs of each of the selected participants query their local operating system 180 to 
determine available graphic formats, and then send this information to die initiating 
Snapshot Sharing module, which determines die format that wUl produce the most 
advantageous display quality and performance for each selected panicipant. 

After the snapshot to be shared is displayed on all CMWs, each 
participant may telepoint on or aimotate the snapshot, which actions and results arc 
displayed on die CMW screens of all participants. This is preferably accomplished by 
monitoring die actions made at tfxc CMW (e.g.. by tracking mouse movements) and 
sending diese "operating system commands" to the CMWs of die other participants, 
radicr than continuously exchanging bitmaps, as would be the case with traditional 
"remote contror products. 

As illustrated in Figure 28, the original unchanged snapshot is stored in 
a fu-st bitmap 210a. A second bitmap 210b stores the combination of the original 
snapshot and any annotations. Thus, when desired (e.g.. by clicking on a CLEAR button 



-38- 

located in each paiiicipanfs Share window, as illustrated in Figure 2B). the original 
unchanged snapshot can be restored (i.c.. erasing aU annotations) using bitmap 2lOa . 
Selective erasures can be accomplished by copying into (i.e.. restoring) the desired 
erased area of bitmap 210b with the corresponding portion from bitmap 2l0a. 

5 Radier than causing a new Share window to be created whenever a 

snapshot is shared, it is possible to replace the contents of an existing Share window with 
a new image. This can be achieved in either of two ways. First, the user can click on 
the GRAB bunon and then select a new window whose conicnts should replace ibe 
contents of the existing Share window. Second, the user can click on the REGRAB 

,0 bunon to cause a (presumably modified) version of the original source window to replace 
the contents of the existing Share window. Thb is particularly useful when one 
participant desires to share a long document that cannot be displayed on the screen in its 
entirety. For example, the user might display dJe first page of a spreadsheet on his 
screen, use the SHARE bunon to share that page, discuss and perhaps annotate it. then 

15 renim to the spreadsheet application to position to the next page, use the REGRAB 

bunon to share the new page, and so on. This mechanism represents a simple, effective 

step toward application sharing. 

Further, instead of sharing a snapshot of data on his current screen, a 
user may instead choose to share a snapshot that had previously been saved as a file. 
This is achieved via the LOAD bunon. which causes a dialog box to appear, prompting 
Ac user to select a file. Conversely, via the SAVE button, any snapshot may be saved. 

with all current annotatioiK. 

The capabilities described above were careftilly selected to be 

particularly effective in environments where the principal goal is to share existing 
25 infonnadon, rather dian to create new information. In particular, user interfaces are 

designed to make snapshot capmre. telepointing and annotation extremely easy to use. 

Nevertheless, it is also to be understood dat. instead of sharing snapshots, a blank 

-whiteboard- can also be shared (via the WHITEBOARD button provided by the 

Rolodex. Collaboration Initiator, and active call windows), and thai more complex 
30 paintbox capabUities could casUy be added for application areas .hat lequire such 

capabilities. 

As pointed out previously herein, important feamres of the present 
system reside in the manner in which the capabUities and advantages of multimedia mail 
(MMM). multimedia conference recording (MMCR). and multimedia document 
35 managemem (MMDM) are tighfly integrated widi audio/video/dau teleconferencing to 
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provide a multimedia collaboration system diat faciliuces an unusually higher level of 
communication and collaboration between geographically dispersed users than has 
heretofore been achievable by known prior art systems. Figure 29 is a schematic and 
diagrammatic view illustrating how multimedia calls/conferences, MMCR, MMM and 
5 MMDM work together to provide the above-described features. MM Editing Utilities 
shown supplementing MMM and MMDM may be identical. 

Having already described various examples of audio/video/data 
teleconferencing, next to be considered are various ways of integrating MMCR, MMM 
and MMDM with audio/video/data teleconferencing. For this purpose, basic preferred 
10 approaches and features of each will be considered along with preferred associated 
hardware and software. 

MULTIMEDIA DOCUMENTS 

In one arrangement, the creation, storage, retrieval and editing of 
multimedia documents serve as the basic element common to MMCR. MMM and 

15 MMDM- Accordingly, the system advantageously provides a universal format for 

multimedia documents. This format defines multimedia documents as a collection of 
individual components in multiple media combined with an overall strucmrc and timing 
componem that captures the identities, detailed dependencies, references to, and 
relationships among the various other conq>onents. The information provided by this 

20 . strucmring component forms the basis for spatial layout, order of presentation, 
hyperlinks, temporal synchronization, etc., with respect to the composition of a 
multimedia document. Figure 30 shows the structure of such documents as well as their 
relationship with editing and storage facilities. 

Each of the components of a multimedia document uses its own editors 

25 for creating, editing, and viewing. In addition, each component may use dedicated 

storage facilities. Multimedia docimaents are advantageously structured for authoring, 
storage, playback and editing by storing some data under conventional file systems and 
some data in special-purpose storage servers as will be discussed later. The 
Conventional File System 504 can be used to store all non-time-scnsitive portions of a 

30 multimedia document. In particular, the following are examples of non-time-sensitive 
data that can be stored in a conventional type of computer file system: 

I . structured and unstructured text 
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2. raster images 

3 structured graphics and vector graphics (e.g. . PostScript) 

4. references to files in oAcr file systems (video, hi-fidelity audio, etc.) via 

pointers 

5 5 restricted forms of execuubles 

6 structure and timing information for all of the above (spatial layout. 

order of presentation, hyperlinks, temporal synchronizaiicMi. ett.) 

Of particular inq>onance in multimedia documents is support for 
ume-sensiiive media and media that have synchronization requirements with other media 

10 components. Some of these time-sensitive media can be stored on conventional file 
systems while others may require special-purpose storage facilities. 

Examples of time-sensitive media that can be stored on conventional file 
systems are small audio files and short or low-quality video clips (e.g. as might be 
produced using QuickTime or Video for Windows). Other examples include window 

15 evem lists as supported by the Window-Event Record and Play system 512 shown in 
Figure 30. This component aflows for storing and replaying a user s interactions with 
application programs by capniring the requests and events exchanged between die client 
program and die window system in a time-stamped sequence. After this "record" phase, 
die resulting information is stored in a conventional file that can later be retrieved and 

20 "played" back. During playback die same sequence of window system requests and 

events reoccurs witii die same relative timing as when they were recorded. In prior-art 
systems, this capabUity has been used for creating automated demonstrations. In the 
present system it can be used, for example, to reproduce annotated snapshots as fliey 

occurred at recording 

As described above in connection widi coUaborative worksution 
software. Snapshot Share 518 shown in Figure 30 is a utiUty used in multimedia calls and 
conferencing for capniring window or screen snapshots, sharing widi one or more call or 
conference paiiicipams. and permitting group annotation, telepointing. and re-grabs. 
Here, diis utility is adapted so diat its capmrcd images and window events can be 
30 recorded by the Window-Event Record and Play system 512 while being used by only 

one person. By synchronizing events associated with a video or audio stream to specific 
frame numbers or time codes, a multimedia call or.conference can be recorded and 
reproduced in its entirety. Similarly, the same fimctionality is preferably used to create 
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multimedia mail whose authoring steps are virtually identical to participating in a 
multimedia call or conference (though other forms of MMM are not precluded). 

Some time-sensitive media require dedicated storage servers in order to 
satisfy real-time requirements. High-quality audio/video segments, for example, require 
dedicated real-time audio/video storage servers. A preferred example of such a server 
will be described later. Next to be considered is how the current system guarantees 
synchronization between different media components. 

MEDIA SYNCHRONIZATION 

A preferred maimer for providing multimedia synchronization will next 
be considered. Only multimedia documents with real-time material need include 
synchronization functions and infonnadon. Synchronization for such situations may be 
provided as described below. 

Audio or video segments can exist without being accompanied by the 
other. If audio and video arc recorded simultaneously (** co-recorded"), the system 
allows the case where their streams are recorded and played back with automatic 
synchronization — as would result from conventional VCRs» laserdisks, or time-division 
multiplexed C^interieaved*") audio/video streams. This excludes die need to tightly 
synchronize (i.e., "l^sync**) separate audio and video sequences. Rather, reliance is cm 
the co-recording capability of the Real-Time Audio/Video Storage Server 502 to deliver 
all closely synchronized audio and video direcdy at its signal outputs. 

Each recorded video sequence is tagged with time codes (e.g. SMFTE at 
1/30 second intervals) or video frame numbers. Each recorded audio sequence is tagged 
with time codes (e.g., SMPTE or MIDD or, if co-recorded with video, video frame 
numbers. 

The system also provides synchronizadon between window events and 
audio and/or video streams. The following fiincdons are supported: 

1. Media-time-driven Svnchronization: synchronizadon of window 
events to an audio, video, or audio/video stream, using the 
real-time media as the timing source. 



2. Machine-nme-driven-Sv nchronization: 

a. synchronization of window events to the system clock 
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b. synchronizarion of the start of an audio, video, or 
audio/video segment to the system clock 



20 



If no audio or video is involved, machinc-time-driven synchronization is 

used throughout the document. Whenever audio and/or video is playing, 

media-time-synchronization is used. The system supports transition between 

machine-time and media-time synchronization whenever an audio/video segment is 

started or stopped. 

As an example, viewipg a multimedia document might proceed as 



follows: 



10 *> 
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Document suits widi an annotated share (machine-timc-driven 
synchronization) . 

Next, start audio only (a "voice annotation") as text and graphical 
annotations on the share continue (audio is timing source for window 
events). 

Audio ends, but annotations continue (machinc-iimc-driven 
synchronization). 

Next, start co-recorded audio/video continuing with further annotations 

on same share (audio is timing source for window events). 

Next, start a new share during the continuing audio/video recording; 

annotations happen on both shares (audio is timing source for window 

events). 

Audio/video stops, annotations on both shares continue 
(machine-time-driven synchronization). 
Document ends. 
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AUDIO/VTOEO STORAGE 

As described above, the present system can include many 
special-purpose servers that provide storage of time-sensitive media (e.g. audio/video 
streams) and support coordination with other media. This section describes the preferred 
S arrangemeiu for audio/video storage and recording services. 

Although storage and recording services could be provided ai each 
CMW. it is preferable to employ a centralized server 502 coupled lo MLAN 10, as 
illustrated in Figure 31. A centralized server 502, as shown in Figure 31, provides the 
following advantages: 

10 I . The total amount of storage hardware required can be far less (due to 

better utilization resulting from statistical averaging). 

2. Bulky and expensive compressioii/deconq>ression hardware can be 
pooled on the storage servers and shared by multiple clients. As a 
result, fewer compression/decompression engines of higher performance 

15 are required than if each workstation were equipped with its own 

compression/decompression hardware. 

3. Also, more costly centralized codecs can be used to transfer mail wide 
area among campuses at Car lower costs that attempting to use data 
WAN technologies. 

20 4. File system administration (e.g. baclaQ)S and file system replication, 

etc.) are far less costly and higher performance. 

The Real-Time AudioA^ideo Storage Server 502 shown in Figure 31 A 
structures and manages the audio/video files recorded and stored on its storage devices. 
Storage devices may typically include computer-controlled VCRs, as well as rewritable 
25 magnetic or optical disks. For example, server 502 in Figure 31A includes disks 60e for 
recording and playback. Analog information is transferred between disks 60e and the 
AA^ Switching Circuitry 30 via analog I/O 62. Control is provided by control 64 
coupled to Data LAN hub 25. 

At a high level* the centralized audio/video storage and playback server 
30 502 in Figure 31 A performs the following functions: 
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File Management: 

It provides mechanisms for creating, naming, time-stamping, storing, 
retrieving, copying, deleting, and playing back some or all portions of an audio/video 
file. 

file Transfer and Replication 

The audio/video file server sun>OTts replication of files on different 
disks managed by the same file server to facilitate simultaneous access to the same files. 
Moreover, file transfer facUines are provided to support tiansmission of audio/video files 
between itself and odier audio/video storage and playback engines. File transfer can also 
be achieved by using the underlying audio/video network facUides: servers establish a 
real-time audio/video network connection between themselves so one server can "play 
back" a file while the second server simultaneously records it. 

Disk Management 

The storage fecilities support specific disk allocation, garbage coUection 
and defragmentation facilities. They also support mapping disks with other disks (for 
repUcation and staging modes, as appropriate) and mapping disks, via I/O equipment, 
with the appropriate Video/Audio network port. 

Synchronization support 

Synchronization between audio and video is ensured by the multiplexing 
scheme used by die storage media. typicaUy by interleaving the audio and video streams 
in a timcKlivision-multiplexed fashion. Further, if synchronization is required with other 
stored media (such as window system graphics), then fiame numbers, time codes, or 
other timing events are generated by the storage server. An advantageous way of 
providing this synchronization is to synchronize record and playback to received frame 
number or time code events. 

Searching 

To support intra-file searching, at least stan, stop, pause, fast forward, 
reverse, and fast reverse operations are provided. To support inter-file searching, 
audio/video tagging, or more generalized -go-to" operations and mechanisms, such as 
frame numbers or time code, arc supported at a search-fiinction level. 



-45- 



Connection Management 

The server handles requests for audio/video network connections from 
client programs (such as video viewers and editors running on client workstations) for 
real-time recording and real-time playback of audio/video files. 

Next to be considered is how centralized audio/video storage servers 
provide for real-time recording and playback of video streams. 

Real-Tune Disk Delivery 
To support real-time audio/video recording and playback, the storage 
server needs lo provide a real-time transmission path between the storage medium and 
the appropriate audio/video network pon for each simultaneous client accessing the 
server. For example, if one user is viewing a video file at the same time several other 
people are creating and storing new video files on the same disk, multiple simultaneous 
paths to the storage media arc required. Similarly, video mail scm to large distribution 
groups, video databases, and simOar functions may also require simultaneous access to 
Che same video files, again imposing multiple access requirements on the video storage 
capabilities. 

For storage servers that are based on computer-controlled VCRs or 
rewritable laserdisks, a real-time transmission path is readily available through the direct 
analog coimection between die disk or tape and the network port. However, because of 
this single direct connection, each VCR or laserdisk can only be accessed by one client 
program at the same time (multi-head laserdisks are an exception). Therefore, storage 
servers based on VCRs and laserdisks are difficult to scale for multiple access usage. 
Multiple access to the same material is provided by file replication and staging, which 
greatly increases storage requirements and the need for moving information quickly 
among storage media units serving different users. 

Video systems based on magnetic disks are more readily scalable for 
simultaneous use by multiple people. A generalized hardware implementation of such a 
scalable storage and playback system 502 is illustrated in Figure 321 Individual I/O cards 
530 supporting digital and analog I/O are linked by intra-chassis digital networking (e.g. 
buses) for file transfer within chassis 532 holding some number of these cards. Multiple 
chassis 532 are linked by inter-chassis networking. The Digital Video Storage System 
available from Parallax Graphics is an example of such a system implementation. 

The bandwidth available for the transfer of files among disks is 
ultimately limited by the bandwidth of these intra-chassis and inter-chassis networking. 
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For systems that use sufficienUy powerful video compression schemes, real-time delivery 
requirements for a smaU number of users can be met by existing file system sofiware 
(such as the Unix file system), provided that the block-size of the storage system is 
optimized for video storage and that sufficient buffering is provided by the operating 
system software to guarantee continuous flow of the audio/video data. 

Special-purpose software/hardware solutions can be provided to 
guarantee higher performance under heavier usage or higher bandwidth conditions. For 
example, a higher throughput version of Figure 32 is iUustrated in Figure 33. which uses 
crosspoint switching, such as provided by SCSI Crossbar 540. which increases the total 
bandwidth of the inter-chassis and intra-chassis network, thereby increasing the number 
of possible simultaneous file transfers. 



Real-Time Network Delivery 
By using the same audio/video format as used for audio/video 
teleconferencing, the audio/video storage system can leverage the previously described 
network facilities: the MLANs 10 can be used to esublish a multimedia networtc 
connection between client woricstations and the audio/video storage servers. 
Audio/Video editors and viewers running on the client workstation use the same software 
interfaces as the multimedia teleconferencing system to establish these network 
connections. 

The resulting architecture is shown in Figure 31B. Cliem workstations 
use the existing audio/video network to connect to the storage server's network ports. 
These network pons are connected to compression/decompression engines that plug imo 
the server bus. These engines compress the audio/video streams that come in over the 
network and store them on the local disk. Similarly, for playback, the server reads 
25 stored video segments from its local disk and routes them dirough the decompression 
engines back to cliem workstations for local display. 

The presem system allows for alternative delivery strategies. For 
example, some compression algorithms are asymmetric, meaning that decompression 
requires much less compute power than compression. In some cases, real-time 
30 decompression can even be done in software, without requiring any special-purpose 

decompression hardware. As a result, there is no need to decompress stored audio and 
vkleo on the storage server and play it back in realtime over dw network. Instead, it can 
be more efficiem to transfer an endre audio/video file from the storage server to the 
cUent workstation, cache it on die workstation's disk, and play it back locally. These 
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observations lead to a modified architecture as presented in Figure 3 IC. In this 
architecture, clients interact with the storage server as follows: 

o To record video, clients set up real-time audio/video network 

connections to the storage server as before (this connection could make 

use of an analog line), 
o In response to a connection request, the storage server allocates a 

compression module to the new client, 
o As soon as the client starts recording, the storage server routes the 

output from the con^^ression hardware to an audio/video Hie allocated 

on its local storage devices, 
o For playback, this audio/video file gets transferred over the data 

network to the client workstation and pre-staged on the workstation's 

local disk. 

o The client uses local decompression software and/or hardware to play 

back the audio/video on its local audio and video hardware. 

This approach frees up audio/video network ports and 
compression/decompression engines on the server. As a result, the server is scaled to 
support a higher number of simultaneous recording sessions, thereby further reducing the 
cost of the system. Note that such an architecture can be employed for reasons other 
than compression/decompression asymmetry (such as the economics of the technology of 
the day, existing embedded base in the enterprise, etc.). 

MULTIMEDIA CONFERENCE RECORDING 

Multimedia conference recording (MMCR) will next be considered. For 
full-feature multimedia desktop calls and conferencing (e.g. audio/video calls or 
conferences with snapshot share), recording (storage) capabilities are preferably provided 
for audio and video of all parties, and also for all shared windows, inchiding any 
telepointing and annotations provided during the teleconference. Using the multimedia 
synchronization facilities described above, these capabilities are provided in a way such 
that they can be replayed with accurate correspondence in time to the recorded audio and 
video, such as by synchronizing to frame numbers or time code events. 
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A preferred way of capturing audio and video from calls would be to 
record all caUs and conferences as if they were multi-party conferences (even for 
two-party caUs), using video mosaicing. audio mixing and cut-and-pasting. as previously 
described in connection with Figures 7-11. It wUl be appreciated that MMCR as 
described will advanageously permit users at their desktop to review real-time 
coUaboration as it previously occurred, inchiding during a later teleconferei»ce. The 
output of a MMCR session U a multimedia document that can be stored, viewed, and 
edited using the multimedia document fecilities described earlier. 

Figure 3 ID shows how conference recording relates to the various 
system components described eariier. The Multimedia Conference Record/Play system 
522 provides the user with the additional GUIs (graphical user interfaces) and other 
functions required to provide the previously described MMCR functionality. 

The Conference Invoker 518 shown in Figure 31D is a utility that 
coordinates the audio/video caUs that must be made to comicct the audio/video storage 
server 502 with special recording outputs on conference bridge hardware (35 in Figure 
3). The resulting recording is linked to infonnation identifying die conference, a 
ftmction also performed by diis utility. 

MULTIMEDIA MAIL 

Now considering multimedia maU (MMM). it wUl be understood diat 
MMM adds to the above-described MMCR die capability of deUvering delayed 
collaboration, as well as die additional ability to review die information multiple times 
and. as described hereinafter, to edit, re-send, and archive it. The captiued information 
is preferably a superset of diat capnired during MMCR. except diat no odier user is 
involved and the user is given a chance to review and edit before sending die message. 

The Multimedia Mail system 524 in Figure 3 ID provides die user widi 
die additional GUIs and odier fiincrions required to provide die previously described 
MMM fiinctionality. Multimedia Mail relies on a conventional EmaU system 506 shown 
in Figure 31D for creating, transporting, and browsing messages. However. multimedU 
document editors and viewers arc used for creating and viewing message bodies. 
Multimedia documents (as described above) consist of time-insensitive components and 
ume-sensitive components. The Conventional Email system 506 relies on the 
Conventional FUe system 504 and Real-Time Audio/Video Storage Server 502 for 
storage support. The time-insensitive components are transported widiin die 
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Conventional Email system 506, while die real-time components may be separately 
transported through the audio/video network using file transfer utilities associated with 
the Real-Time Audio/Video Storage Server 502. 

MULTIMEDIA DOCUMENT MANAGEMENT 
5 Multimedia document management (MMDM) provides long-term, 

high-volume storage for MMCR and MMM. The MMDM system assists in providing 
the following capabilities to a CMW user: 

Multimedia documents can be authored as mail in the MMM system or 
as call/conference recordings in the MMCR system and then passed on 
to the MMDM system. 

To the degree supported by external compatible multimedia editing and 
authoring systems, multimedia documents can also be authored by 
means other than MMM and MMCR. 

Multimedia documents stored within the MMDM system can be 
reviewed and searched. 

Multimedia documents stored within the MMDM system can be used as 
material in the creation of subsequent MMM. 

Multimedia documents stored within the MMDM system can be edited 
to create other multimedia documents. 

The Multimedia Document Management system 526 in Figure 3 ID 
provides the user with the additional GUIs and other functions required to provide the 
previously described MMDM functionality. The MMDM includes sophisticated 
searching and editing capabilities in connection with the MMDM multimedia document 
such diat a user can rapidly access desired selected portions of a stored multimedia 
25 document. The Specialized Search system 520 in Figure 30 con^)rises utilities that allow 
users to do more sophisticated searches across and within multimeclia documents. This 
includes context-based and content-based searches (employing operadons such as speech 
and image recogniuon, information filters, etc), time-based searches, and event-based 
searches (window events, call management events, speech/audio events, etc.). 
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CLASSES OF COLLABORATION 
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Thc resulting multimedia collaboration environment achieved by the 
above-described integration of audio/video/data teleconferencing, MMCR, MMM and 
MMDM is illustrated in Figure 34. It wUl be evident that each user can collaborate with 
other users in real-time despite separations in space and time. In addition, collaborating 
users can access faaformation already available within dieir computing and information 
systems, including information capmred from previous collaborations. Note in Figure 3^ 
dial space and time separations are supported in the following ways: 

1, 5?fng rime, different place 
Muhimedia calls and conferences 

2. pjffeTent time, same place 

MMDM access to stored MMCR and MMM information, or use of 
MMM directiy (i.e., copying mail to oneselO 



3, different tirnff different place 

MMM 

^5 4. m^. «»nc place 

Collaborative, face-to-face, multimedia documehi creation. 

By use of the same user interfaces a network functions, die present 
system smoothly spans these three venus. 

REMOTE ACCESS TO EXPERTISE 

20 In order to illustrate how die system may be implemented and operated, 

an exemplary preferred system wiU be described having features applicable to the 
aforementioned scenario involving remote access to expertise. It is to be understood that 
die system may be ad^ted for other applications (such as in engineering and 
manufacniring) or uses having more or less hardware, software and operating features 

25 and combined in various ways. 

Consider die following scenario involving access from remote sites to an 
in-house corporate -experf* in die ffading of fmancial instruments such as in die 
securities market: 



The focus of the scenario revolves around the activities of a trader who 
is a specialist in securities. The setting is the stan of his day at his desk in a major 
fuiancial center (NYC) at a major U.S. investment bank. 

The Expcn has been actively watching a particular security over the past 
week and upon his arrival into the office, he notices it is on the rise. Before going home 
last night* he previously set up his system to filter overnight news on a panicular family 
of securities and a securi^r within that family. He scans the filtered news and sees a 
story that may have a long-term impact on this security in question. He believes he 
needs to act now in order to get a good price on the security. Also, through filtered 
mail, he sees that his counteipan in London, who has also been watching this security, is 
interested in gening our Expert's opinion once he arrives at work. 

The Expert issues a multimedia mail message on the security to the head 
of sales worldwide for use in working with their client base. Also among the recipients 
is an analyst in the research department and his counterpart in London. The Expert, in 
preparation for his previously established *'on-call*' office hours, consults with others 
within the corporation (using the videoconferencing and other collaborative techniques 
described above), accesses company records from his CMW, and analyzes such 
information, employing software-assisted analytic techniques. His office hours are now 
at hand, so he enters ''intercom'* mode, which enables incoming calls to appear 
automatically (without requiring the Expert to "answer his phone" and elect to accept or 
reject the call). 

The Expert's computer beeps, indicating an incoming call, and the 
image of a field representative 201 and his client 202 who are located at a bank branch 
somewhere in the U.S. appears in video window 203 of the Ejqpert's screen (shown in 
Fig. 35). Note that, unless the call is converted to a "conference" call (whether 
explicitly via a menu selection or implicitly by calling two or more other participants or 
adding a third participant to a call), the callers will see only each other in the video 
window and wiU not see themselves as pan of a video mosaic. 

Also illustrated on the Expert's screen in Fig. 35 is the Collaboration 
Initiator window 204 from which the Expert can (utilizing Collaboration Initiator 
software module 161 shown in Fig. 20) initiate and control various collaborative 
sessions. For example, the user can initiate with a selected participant a video call 
(CALL button) or the addition of that selected participant to an existing video call (ADD 
button), as well as a share session (SHARE button) using a selected window or region on 
the screen (or a blank region via the WHITEBOARD button for subsequent aruiotation). 
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The user can also invoke his MAIL software (MAIL button) and prepare outgoing or 
check incoming Email messages (Uw presence of which is indicated by a picture of an 
envelope in the dog's mouth in In Box icon 205). as well as check for "I called" 
messages from other callers (MESSAGES bunon) left via the LEAVE WORD button in 
video window 203. Video window 203 also contains buttons from which many of tfiese 
and certain addiuonal feamres can be invoked, such as hanging up a video call 
(HANGUP bunon). putting a call on hold (HOLD button), resuming a call previously put 
on hold (RESUME bunon) or muting the audio portion of a call (MUTE button). In 
addition, the user can invoke the recording of a conference by the conference RECORD 
button. Also prescm on the Expert's screen is a standard desktop window 206 containing 
icons fiom which other programs can be launched. 

Reniming to the example, die Expert is now engaged in a 
videoconference with field represenutive 201 and his diem 202. In the course of this 
videoconference. as Ulusiraicd in Fig. 36. the field representative shares widi die Expen 
15 a graphical image 210 (pie chart of client portfolio holdings) of his client's portfolio 
holdings (by clicking on his SHARE bunon. corresponding to the SHARE button in 
video window 203 of the Expert's screen, and selecting that image from his screen, 
resulting in the shared image appearing in die Share window 21 1 of the screen of all 
participants to die share) and begins to discuss die dicm's investment dilemma. The 
20 field representative afcio invokes a command to secredy bring up die client profile on die 
Expen's screen. 

After considering diis information, reviewing the shared portfolio and 
asking clarifying questions, die Expert illustrates his advice by creating (using his own 
modelling software) and sharing a new graphical image 220 (Fig. 37) widi die field 
25 representative and his clienL EiUier party to die share can annotate diat image using die 
drawing tools 221 (and die TEXT button, which permits typed characters to be 
displayed) provided wifliin Share window 211 . or "regrab" a modified version of die 
original image (by using die REGRAB button), or remove aU such annotations (by using 
die CLEAR button of Share window 21 1). or -grab" a new image to share (by clicking 
on die GRAB button of Share window 21 1 and selecting diat new image from die 
screen). In addition, any participant to a shared session can add a new participant by 
selecting diat participant from die rolodcx or quick-dial list (as described above for video 
calls and for data conferencing) and clicking die ADD button of Share window 21 1. One 
can also save die shared image (SAVE button), load a previously saved image to be 
35 shared (LOAD button), or print an image (PRINT bunon). 
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While discussing che Expert's advice, field representative 201 makes 
annotations 222 to image 220 in order to illustrate his concerns. While responding to the 
concerns of field representative 201, the Expen hears a beep and receives a visual notice 
(New Call window 223) on his screen (not visible to the field representative and his 
client), indicating che existence of a new incoming call and identifying the caller. At this 
point, the Expert can accept the new call (ACCEPT button), refuse the new call 
(REFUSE button, which will result in a message being displayed on the caller's screen 
indicating that the Expen is unavailable) or add the new caller to the Expen' s existing 
call (ADD button). In this case, die Expen elects yet another option (not shown) - to 
defer the call and leave the caller a standard message that the Expen will call back in X 
minutes (in this case, 1 minute). The Expen then elects also to defer his existing calK 
telling the field representative and his client that he will call them back in 5 minutes, and 
then elects to return the initial deferred call. 

It should be noted that the Expert's act of deferring a call results not 
only in a message being sent to the caller, but also in the caller's name (and perhaps 
other information associated with the call, such as the time the call was deferred or is to 
be resumed) being displayed in a list 230 (see Fig. 38) on the Expert's screen from which 
the call can be reinitiated. Moreover, the ''state** of the call (e.g., the information being 
shared) is retained so that it can be recreated when the caU is reinitiated. Unlike a 
''hold* (described above), deferring a call actually breaks the logical and physical 
coimections, requiring that the entire call be reinitiated by the Collaboration Initiator and 
the AVNM as described above. 

Upon returning to the initial deferred call, the Expert engages in a 
videoconfercnce with caller 231, a research analyst who is located 10 floors up from the 
Expert with a complex question regarding a particular security. Caller 23 1 decides to 
add Lx>ndon expert 232 to the videoconfercnce (via the ADD button in Collaboration 
Initiator window 204) to provide additional information regarding the factual history of 
the security. Upon selecting the ADD button, video window 203 now displays, as 
illustrated in Fig. 38, a video mosaic consisting of three smaller images (instead of a 
single large image displaying only caller 231) of the Expen 233, caller 231 and London 
expen 232. 

During this videoconfercnce, an urgent PRIORITY request (New Call 
window 234) is received from the Expert's boss (who is engaged in a three-party 
videoconfercnce call with two members of the bank's operations department and is 
attempting to add the Expert to that call to answer a quick question). The Expert puts his 
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threc-pany videoconference on hold (merely by clicking the HOLD bunon In video 
window 203) and accepts (via the ACCEPT button of New Call window 234) the urgent 
call from his boss, which results in the Expen being added to the boss' three-party 
videoconference call. 

5 As illustrated in Fig. 39. video window 203 is now replaced witfj a 

four-person video mosaic representing a four-party conference call consisting of the 
Expert 233, his boss 241 and the two members 242 and 243 of the bank's operations 
depanment. The Expert quickly answers the boss' question and, by clicking on die 
RESUME button (of video window 203) adjacent to the names of the other participants to 

10 the call on hold, simultaneously hangs up on the conference call with his boss and 

resumes his threc^jarty conference call involving die securities issue, as iUustrated in 

video window 203 of Fig. 40. 

While that caU was on hold, however, analyst 231 and London expert 
232 were stiU engaged in a two-way videoconference (with a blackened portion of die 
15 video mosaic on dieir screens indicating diat die Expert was on hold) and had shared and 
annotated a graphical image 250 (see annotations 251 to image 250 of Fig. 40) 
illustrating certain financial concerns. Once the Expen resumed the caU, analyst 231 
added die Expert to die share session, causing Share window 21 1 comaining annotated 
image 250 to appear on the Expert's screen. OptionaUy. snapshot sharing could progress 

20 while the video was on hold. 

Before concluding his conference regarding the securities, die Expert 
receives notificaiion of an incoming multimedia mail message - e.g.. a beep 
accompanied by die appearance of an envelope 252 in die dog's mouth in In Box icon 
205 shown in Fig. 40. Once he concludes his call, he quickly scans his incoming 

25 multimedia maU message by clicking on In Box icon 205, which invokes his mail 
software, and dien selecting die incoming message for a quick scan, as generaUy 
fllustratcd in die top two windows of Fig. 2B. He decides it can wait for fimher review 
as die sender is an analyst odier dian die one helping on his security question. 

He dien reinitiates (by selecting deferred call indicator 230, shown in 

30 Fig. 40) his deferred call wiUi field representative 201 and his client 202. as shown in 
Fig. 41. Note diat die full state of die call is also recreated, including restoration of 
previously shared image 220 widi annotations 222 as dicy existed when die call was 
deferred (see Fig. 37). Note also in Fig. 41 diat, having reviewed his only unread 
incoming multimedia mail message. In Box icon 205 no longer shows an envelope in die 

35 dog's moudi. indicating diat die Expert currendy has no unread incoming messages. 
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As the Expert continues to provide advice and pricing information to 
field representative 20U he receives notification of three priority calls 261-263 in short 
succession. Call 261 is the Head of Sales for the Chicago office. Working at home, she 
had instructed her CMW to alert her of all urgent news or messages, and was 
subsequendy alened to the arrival of the Expert's earlier multimedia mail message. Call 
262 is an urgent international call. Call 263 is from the Head of Sales in Los Angeles. 
The Expen quickly winds down and then concludes his call with field representative 201 . 

The Expert notes from call indicator 262 that diis call is not only an 
international call (shown in the top portion of the New Call window), but he realizes it is 
from a laptop user in the field in Central Mexico. The Expert elects to prioritize his calls 
in the following manner 262, 261 and 263. He therefore quickly answers call 261 (by 
clicking on its ACCEPT bunon) and puts that call on hold while deferring call 263 in the 
manner discussed above. He then proceeds to accept the call identified by international 
call indicator 262. 

Note in Fig. 42 deferred call indicator 271 and the indicator for the call 
placed on hold (next to the highlighted RESUME button in video window 203), as well 
as die image of caller 272 from the laptop in the field in Central Mexico. Although 
Mexican caller 272 is outdoors and has no direct access to any wired telephone 
connecdon, his laptop has two wireless modems permitting dial-up access to two data 
coimections in the nearest field office (through which his calls were routed). The system 
automatically (based upon the laptop's registered service capabilities) allocated one 
connection for an analog telephone voice call (using his laptop's built-in microphone and 
speaker and the Expert's computer-integrated telephony capabilities) to provide audio 
teleconferencing. The other coimection provides control, data conferencing and one-way 
digital video (i.e., the laptop user cannot see the intiage of the Expert) from the laptop's 
built-in camera, albeit at a very slow frame rate (e.g., 3-10 smalt frames per second) due 
to the relatively slow dial-up phone connection. 

It is important to note that, despite the limited capabilities of the wireless 
laptop equipment, the present system accommodates such capabilities, supplementing an 
audio telephone coimection with limited (i.e., relatively slow) one-way video and data 
conferencing functionality. As telephony and video compression technologies iiiq>rove, 
the present system will accommodate such improvements automatically. Moreover, even 
with one participant to a teleconference having limited capabilities, other participants 
need not be reduced to this "lowest common denominator". For example, additional 
participants could be added to the call illustrated in Fig. 42 as described above, and such 
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participants could have fuU videoconferencing, data conferencing and other collaborative 
functionality vis-a-vis one another, while having limited fiinctionality only with caller 
272. 

As his day evolved, the off-site salesperson 272 in Mexico was notirted 
by his manager through the laptop about a new security and became convinced that his 
client would have particular interest in diis issue. The salesperson dierefore decided to 
contact the Expert as shown in Figure 42. WhUe discussing the security issues, the 
Expert again shares all captured gr^hs. charts, etc. 

The salesperson 272 also needs the Expert's help on another issue. He 
has hard copy only of a client's portfolio and needs some advice on its composition 
before he meets with the client tomonow. He says he wiU fax it to the Expen for 
analysis. Upon receiving the fax-on his CMW. via computer-integrated fax-the Expert 
asks if he should cither send die Mexican caller a -QuickTime" movie (a lower quality 
compressed video standard from Apple Computer) on bis laptop tonight or send a 
15 higher-quaUty CD via FedX tomonow - the notion being that the Expert can produce an 
acmal video presentation with modeb and annotations in video form. The salesperson 
can then play it to his client tomorrow afternoon and it will be as if the Expert is in the 
room. The Mexican caller decides he would prefer die CD. 

Continuing widi this scenario, the Expert learns, in the course of his call 
20 widi remote laptop caller 272. that he missed an important issue during his previous 
quick scan of his incoming multimedia mail message. The Expert is upset that the 
sender of die message did not utilize die "video highlight" feature to highlight diis aspect 
of UK message. This feanire permits the composer of the message to define "tags" (e.g. . 
by cUcking a TAG button, not shown) during record time which are stored with the 
message along with a "time stamp", and which cause a predefined or selectable audio 
and/or visual indicator to be played/displayed at diat precise point in die message during 
playback. 

Because Uiis issue relates to die caller that die Expert has on hold, die 
Expert decides to merge die two calls togedier by adding die call on hold to his existing 
30 call. As noted above, bodi die Expert and die previously held caller will have fiill video 
capabilities vis-a-vis one anodier and will see a Uiree-way mosaic image (with die image 
of caller 272 at a slower frame rate), whereas caller 272 wUl have access only to die 
audio portion of diis duee-way conference caU. tiiough he will have data conferencing 
fiincdonality widi bodi of the other participants. 
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The Expert forwards the mulumedia mail message to both caller 272 and 
the odier panicipam, and all three of them review the video enclosure in greater detail 
and discuss die concern raised by caller 272. They share ccnain relevant data as 
described above and realize that they need to ask a quick question of another remote 
5 expert. They add that expert to the call (resulting in the addition of a fourth image to the 
video mosaic, also not shown) for less than a minute while diey obtain a quick answer to 
their question. They then continue their three-way call until the Expert provides his 
advice and then adjourns the call. 

The Expen composes a new multimedia mail message, recording his 

10 image and audio synchronized (as described above) to the screen displays resulting from 
his simultaneous interaction with his CMW (e.g., running a program that performs 
certain calculations and displays a graph while the Expert illustrates certain points by 
telepoinung on the screen, during which time his image and spoken words are also 
capmred). He sends this message to a number of salesforce recipients whose identities 

15 are determined auiomatically by an outgoing mail filter that utilizes a database of 

information on each potential recipient (e.g., selecting only those whose clients have 
investment policies which allow this type of investment). 

The Expen then receives an audio and visual reminder (not shown) that 
a particular video feed (e.g.. a short segment of a financial cable television show 

20 featuring new financial instruments) will be triggered automatically in a few minutes. He 
uses this time to search his local securities database, which is dynamically updated from 
financial information feeds (e.g., prepared from a broadcast textual stream of current 
financial events with indexed headers that automatically apphes data filters to select 
incoming events relating to certain securities). The video feed is then displayed on the 

25 Expert's screen and he watches this short video segment. 

After analyzing this extremely up-to-date information, the Expert then 
reinitiates his previously deferred call, from indicator 271 shown in Fig. 42, which he 
knows is from the Head of Sales in Los Angeles, who is seeking to provide his prime 
clients with securities advice on another securities transaction based upon the most recent 

30 available information. The Expert's call is not answered directiy, though he receives a 
short prerecorded video message (Ltft by the caller who had to leave his home for a 
meeting across town soon after his priority message was deferred) asking that the Expert 
leave him a multimedia mail reply message with advice for a particular client, and 
explaining diat he will access this message remotely from his laptop as soon as his 
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meeting is concluded. The Expert complies with ±is request and composes and sends 

this mail message. 

The Expert then receives an audio and visual reminder on his screen 
indicating that his office hours will end in two minutes. He switches from -intercom" 
mode to "telephone'' mode so that he will no longer be disturbed without an opportunity 
to reject incoming calls via the New Call window described above. He then receives and 
accepts a final call concerning an issue from an electronic meeting several mondis ago. 
which was recorded in its entirety. 

The Expert accesses this recorded meeting from his "corporate 
memory". He searches the recorded meeting (which appears in a second video window 
on his screen as would a live meeting, along widi standard controls for 
stop/play/rewind/fasi forward/etc.) for an event diat wUl trigger his memory using his 
fast forward controls, but cannot locate the desired portion of the meeting. He then 
elects to search the ASCD text log (which was auiomadcally exnractcd in the background 
15 after the meeting had been recorded, using the latest voice recognition techniques), but 
still cannot locate the desired ponion of the meeting. Finally, he applies an information 
filter to perform a content-oriented (rather than literal) search and fmds the portion of the 
meeting he was seeldng. After quickly reviewing this short portion of the previously 
recorded meeting, the Expert responds to die caller's question, adjourns die call and 
20 concludes his office hours. 

It should be noted that the above scenario involves many state-of-tiie-art 
desktop tools (e.g.. video and infonnation feeds, information filtering and voice 
recognition) tiiat can be Icvciagcd by our Expert during videoconferencing, data 
conferencing and odicr collaborative activities provided by the present system - because 
this system, instead of providiiig a dedicated videoconferencing system, provides a 
desktop multimedia coUaboration system that integrates into the Expert's existing 
worksution/LAN/WAN environment. 

It should also be noted that all of die preceding collaborative activities in 
diis scenario took place during a relatively short portion of the expert's day (e.g., less 
than an hour of cumulative time) while die Expert remained in his office and continued to 
utilize die tools and information available from his desktop. Previously, such a scenario 
would not have been possible because many of tiiese activities could have taken place 
only with face-to-facc collaboration, which in many circumstances is not feasible or 
economical and which tfius may well have resulted in a loss of the associated business 
35 oppormnities. 
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Many modifications and variations can be made in hardware, software, 
operation, uses, protocols and data formats. For example, for ccnain applications, it 
will be useful to provide some or all of the audio/video signals in digital form. 

Tlie application is divided out of application No. 9410665.5. Serial No. 
2,282,506 to which reference should be made. Attention is also drawn to the following 
applications likewise divided out of that application: 

97 

97 

97 
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CLAIMS 



1 . A teleconferencing system for conducting a teleconference among a plurality of 
participants comprising: 

(a) a plurality of workstations each having monitors for displaying visual images* 
5 and associated AV capmie and reproduction capabilities for capmring and 

reproducing video images and spoken audio of ibc partic^ants; and 

(b) a common collaboration initiator for initiating a plurality of types of 
collaboration among die plurality of participants, the types of collaboration being 
selected from the set consisting of dau conferencing, videoconferencing, 

IQ telephone conferencing, die sending of faxes and the sending of multimedia mail 

messages, die common collaboration initiator including 
(i) 

a callee selector for selecting one or more desired participants from a plurality of 
potential participant; and 

15 (ii) 

a coUaboration type selector for selecting a desired collaboration type from 

among the plurality of collaboration types. 

2. The teleconferencing system of claim 1 , die caUee selector having: 

(a) a first selector for selecting one or more desired participants from a first set of 
20 die potential paiticipants; and 

(b) a second selector for selecting one or more desired participants ftx>m a second 
set of potential participants, die second set being a subset of die first set. 

3. The teleconferencing system of claim 2, wherein: 

(a) die fii3i selector includes names of die potential participants in die first set; and 
25 (b) die second seleaor includes icons representing die potential participants in die 
second set. 

4. The teleconferencing system of claim 2, wherein die first and second selectors 
have associated coUaboration type selector buttons representing die collaboration types. 

5. The teleconferencing system of claim 2, wherein die furst and second selectors 
30 appear in the same window on a workstation monitor. 
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6. The teleconferencing system of claim 1, wherein the conmion collaboration 
initiator can be invoked by a combination of any one of the group consisting of: a user 
action for selecting each of the desired participants, a user action for selecting the desired 
collaboration type, and, if the desired collaboration type is not videoconferencing or 
telephone conferencing, additional user action for selecting information to be sent to at 
least one of the desired panicipants. 

7. The teleconferencing system of claim 1 , wherein the common collaboration 
initiator can be invoked by a user action for selecting a desired panicipant and a default 
collaboration type. 

8. The teleconferencing system of claim 1 , further comprising: 

(a) a teleconferencing manager for managing a teleconference among a plurality of 
participants, and allowing at least one of the participants access to at least one 
multimedia service for providing audio and video signals to be reproduced at the 
workstation of another of the participants for receiving video images and spoken 
audio of the other participant. 

9. The teleconferencing system of claim I, further con^rising: 

(a) an AV path for carrying AV signals among the workstations, the AV signals 
representing video images and/or spoken audio of the panicipants; 

(b) an AV conference manager for managing a videoconference during which the 
video image and spoken audio of one of the participants is reproduced at the 
workstation of another of the participants; 

' wherein the AV conference manager is operable to suppon a maximum number 
of calls equal to N, where N is any integer, associated with a workstation; and 

(c) a call selector which enables a participant, operating the workstation, when feced 
with M possible calls where M is an integer greater than N, to select N calls of 
the M possible calls. 

10. The teleconferencing system of claim 9, further comprising means, operable by 
the participant, to invoke further calls even if the AV conference manager is supponing 
N active calls, and to give the participant the opportunity to select which calls are to be 
active. 
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1 1 . The teleconferencing system of claim 1. further comprising: 

(a) a remote participant hold selection mechanism for placing on hold, in a 

videoconference call among a hold-activating participant and a plurality of other 

participants, at least one of the other participants. 

5 12. The teleconferencing system of claim 1, further comprising: 

(a) a remote participant disconnection mechanism for disconnecting, in a 

teleconference call among a participant to be remotely disconnected and a 
plurality of other participants, at least one of the other participants. 

13. A teleconferencing system for conducting a teleconference among a plurality of 
10 participants, con^risiiig: 

(a) a plurality of workstanons associated widi a corresponding plurality of 
participants, each workstation including a monitor; 

(b) AV capnire and reproduction capabUiries at each workstation for capmring and 
reproducing video images and spoken audio of die participants, and 

15 (c) an add participant selection mechanism for selecting a new participant from 

among a plurality of potential participants and adding the new participam to an 
active teleconference call. 

14. A teleconferencing workstation, comprising: 

(a) AV capnire and reproduction means for capmring and reproducing video images 
20 and spoken audio, die AV capmre and reproduction means being coupled to a 

bidirectional real-time AV port; 

(b) telephone transducer means; and 

(c) an incoming call acceptance mechanism for detecting an incoming teleconference 
call at the workstation and, if the workstation user is engaged in an active 

25 teleconference caU, invoking die telephone transducer, whereby the user is 

notified of and provided with the c^on of accepting the idcoming 
teleconference call. 

15. A teleconferencing system for conducting a teleconference among a plurality of 
participants having workstations widi associated monitors for displaying visual images, 

30 and with associated AV capmre and reproduction capabUities for capmring and 

reproducing video images and spoken audio of said participants, said workstations being 
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inierconnected by a first network, said network providing a data padi for carrying digital 
data signals among said worksiations, the teleconferencing system comprising: 
(a) a common collaboration initiator for initiating a plurality of types of 
collaboration among said plurality of participants, said types of collaboration including 
data conferencing, videoconferencing, telephone conferencing, and the sending of faxes 
and multimedia mail messages, said common collaboration initiator including 

(i) a participant selector for selecting one or more desired participants from among 
a plurality of potenual participants: and 

(ii) a collaboration type selector for selecting a desired collaboration type from 
among said plurality of collaboration types. 

16. A teleconferencing system for coiKlucting a teleconference among a plurality of 
participants having workstations with associated monitors for displaying visual images, 
and with associated AV capnjre and reproduction capabilities for capturing and 
reproducing video images and spoken audio of said participants, said workstations being 
interconnected by a first network, said network providing a data path for carrying digital 
data signals among said workstations, the teleconferencing system comprising: 
(a) an add participant selection mechanism for selecting a new participant from 
among a phirality of potential participants and adding said new participant to an active 
teleconference call. 
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